* Understanding Flow API action RSS @ 2021-12-29 14:34 Ivan Malov 2022-01-04 12:41 ` Thomas Monjalon 0 siblings, 1 reply; 10+ messages in thread From: Ivan Malov @ 2021-12-29 14:34 UTC (permalink / raw) To: Adrien Mazarguil, Thomas Monjalon, dev; +Cc: Andrew Rybchenko Hi all, In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is to provide "Queue indices to use". But it is unclear whether the order of elements is meaningful or not. Does that matter? Can queue indices repeat? An ethdev may have "global" RSS setting with an indirection table of some fixed size (say, 512). In what comes to flow rules, does that size matter? When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? Please advise. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding Flow API action RSS 2021-12-29 14:34 Understanding Flow API action RSS Ivan Malov @ 2022-01-04 12:41 ` Thomas Monjalon 2022-01-04 16:54 ` Stephen Hemminger 0 siblings, 1 reply; 10+ messages in thread From: Thomas Monjalon @ 2022-01-04 12:41 UTC (permalink / raw) To: Ivan Malov; +Cc: Adrien Mazarguil, dev, Andrew Rybchenko, orika +Cc Ori Kam, rte_flow maintainer 29/12/2021 15:34, Ivan Malov: > Hi all, > > In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is > to provide "Queue indices to use". But it is unclear whether the order of > elements is meaningful or not. Does that matter? Can queue indices repeat? > > An ethdev may have "global" RSS setting with an indirection table of some > fixed size (say, 512). In what comes to flow rules, does that size matter? > > When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does > that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? > > Please advise. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding Flow API action RSS 2022-01-04 12:41 ` Thomas Monjalon @ 2022-01-04 16:54 ` Stephen Hemminger 2022-01-04 18:29 ` Ivan Malov 0 siblings, 1 reply; 10+ messages in thread From: Stephen Hemminger @ 2022-01-04 16:54 UTC (permalink / raw) To: Thomas Monjalon Cc: Ivan Malov, Adrien Mazarguil, dev, Andrew Rybchenko, orika On Tue, 04 Jan 2022 13:41:55 +0100 Thomas Monjalon <thomas@monjalon.net> wrote: > +Cc Ori Kam, rte_flow maintainer > > 29/12/2021 15:34, Ivan Malov: > > Hi all, > > > > In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is > > to provide "Queue indices to use". But it is unclear whether the order of > > elements is meaningful or not. Does that matter? Can queue indices repeat? The order probably doesn't matter, it is like the RSS indirection table. rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ] So you could play with multiple queues matching same hash value, but that would be uncommon. > > An ethdev may have "global" RSS setting with an indirection table of some > > fixed size (say, 512). In what comes to flow rules, does that size matter? Global RSS is only used if the incoming packet does not match any rte_flow action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS these take precedence. > > When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does > > that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? No the default is always Toeplitz. This goes back to the original definition of RSS which is in Microsoft NDIS and uses Toeplitz. DPDK should have more examples of using rte_flow, I have some samples but they aren't that useful. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding Flow API action RSS 2022-01-04 16:54 ` Stephen Hemminger @ 2022-01-04 18:29 ` Ivan Malov 2022-01-04 21:56 ` Stephen Hemminger 0 siblings, 1 reply; 10+ messages in thread From: Ivan Malov @ 2022-01-04 18:29 UTC (permalink / raw) To: Stephen Hemminger Cc: Thomas Monjalon, Adrien Mazarguil, dev, Andrew Rybchenko, orika Hi Stephen, On Tue, 4 Jan 2022, Stephen Hemminger wrote: > On Tue, 04 Jan 2022 13:41:55 +0100 > Thomas Monjalon <thomas@monjalon.net> wrote: > >> +Cc Ori Kam, rte_flow maintainer >> >> 29/12/2021 15:34, Ivan Malov: >>> Hi all, >>> >>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is >>> to provide "Queue indices to use". But it is unclear whether the order of >>> elements is meaningful or not. Does that matter? Can queue indices repeat? > > The order probably doesn't matter, it is like the RSS indirection table. Sorry, but RSS indirection table (RETA) assumes some structure. In it, queue indices can repeat, and the order is meaningful. In DPDK, RETA may comprise multiple "groups", each one comprising 64 entries. This 'queue' array in flow action RSS does not stick with the same terminology, it does not reuse the definition of RETA "group", etc. Just "queue indices to use". No definition of order, no structure. The API contract is not clear. Neither to users, nor to PMDs. > > rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ] > > So you could play with multiple queues matching same hash value, but that > would be uncommon. > >>> An ethdev may have "global" RSS setting with an indirection table of some >>> fixed size (say, 512). In what comes to flow rules, does that size matter? > > Global RSS is only used if the incoming packet does not match any rte_flow > action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS > these take precedence. Yes, I know all of that. The question is how does the PMD select RETA size for this action? Can it select an arbitrary value? Or should it stick with the "global" one (eg. 512)? How does the user know the table size? If the user simply wants to spread traffic across the given queues, the effective table size is a don't care to them, and the existing API contract is fine. But if the user expects that certain packets hit some precise queues, they need to know the table size for that. So, the question is whether the users should or should not build any expectations of the effective table size and, if they should, are they supposed to use the "global" table size for that? > >>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does >>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? > > No the default is always Toeplitz. This goes back to the original definition > of RSS which is in Microsoft NDIS and uses Toeplitz. Then why have a dedicated enum named TOEPLITZ? Also, once again, the documentation should be more specific to say which algorithm exactly this DEFAULT choice provides. Otherwise, it is very vague. > > DPDK should have more examples of using rte_flow, I have some samples > but they aren't that useful. > I could not agree more. Thanks, Ivan M. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Understanding Flow API action RSS 2022-01-04 18:29 ` Ivan Malov @ 2022-01-04 21:56 ` Stephen Hemminger 2022-01-09 12:23 ` Ori Kam 0 siblings, 1 reply; 10+ messages in thread From: Stephen Hemminger @ 2022-01-04 21:56 UTC (permalink / raw) To: Ivan Malov Cc: Thomas Monjalon, Adrien Mazarguil, dev, Andrew Rybchenko, orika On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) Ivan Malov <ivan.malov@oktetlabs.ru> wrote: > Hi Stephen, > > On Tue, 4 Jan 2022, Stephen Hemminger wrote: > > > On Tue, 04 Jan 2022 13:41:55 +0100 > > Thomas Monjalon <thomas@monjalon.net> wrote: > > > >> +Cc Ori Kam, rte_flow maintainer > >> > >> 29/12/2021 15:34, Ivan Malov: > >>> Hi all, > >>> > >>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is > >>> to provide "Queue indices to use". But it is unclear whether the order of > >>> elements is meaningful or not. Does that matter? Can queue indices repeat? > > > > The order probably doesn't matter, it is like the RSS indirection table. > > Sorry, but RSS indirection table (RETA) assumes some structure. In it, > queue indices can repeat, and the order is meaningful. In DPDK, RETA > may comprise multiple "groups", each one comprising 64 entries. > > This 'queue' array in flow action RSS does not stick with the same > terminology, it does not reuse the definition of RETA "group", etc. > Just "queue indices to use". No definition of order, no structure. > > The API contract is not clear. Neither to users, nor to PMDs. > > > > > rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ] > > > > So you could play with multiple queues matching same hash value, but that > > would be uncommon. > > > >>> An ethdev may have "global" RSS setting with an indirection table of some > >>> fixed size (say, 512). In what comes to flow rules, does that size matter? > > > > Global RSS is only used if the incoming packet does not match any rte_flow > > action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS > > these take precedence. > > Yes, I know all of that. The question is how does the PMD select RETA size > for this action? Can it select an arbitrary value? Or should it stick with > the "global" one (eg. 512)? How does the user know the table size? > > If the user simply wants to spread traffic across the given queues, > the effective table size is a don't care to them, and the existing > API contract is fine. But if the user expects that certain packets > hit some precise queues, they need to know the table size for that. > > So, the question is whether the users should or should not build > any expectations of the effective table size and, if they should, > are they supposed to use the "global" table size for that? You are right this area is completely undocumented. Personally would really like it if rte_flow had a reference software implementation and all the HW vendors had to make sure their HW matched the SW reference version. But this a case where the funding is all on the HW side, and no one has time or resources to do a complete SW version.. A sane implementation would configure RSS indirection as across all rx queues that were available when the device was started; ie all queues that did not have deferred start set. Then the application would start/stop queues and use rte_flow to reach them. But it doesn't appear the HW follows that model. > >>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does > >>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? > > > > No the default is always Toeplitz. This goes back to the original definition > > of RSS which is in Microsoft NDIS and uses Toeplitz. > > Then why have a dedicated enum named TOEPLITZ? Also, once again, the > documentation should be more specific to say which algorithm exactly > this DEFAULT choice provides. Otherwise, it is very vague. > > > > > DPDK should have more examples of using rte_flow, I have some samples > > but they aren't that useful. > > > > I could not agree more. > > Thanks, > Ivan M. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Understanding Flow API action RSS 2022-01-04 21:56 ` Stephen Hemminger @ 2022-01-09 12:23 ` Ori Kam 2022-01-09 13:03 ` Ivan Malov 0 siblings, 1 reply; 10+ messages in thread From: Ori Kam @ 2022-01-09 12:23 UTC (permalink / raw) To: Stephen Hemminger, Ivan Malov Cc: NBU-Contact-Thomas Monjalon (EXTERNAL), NBU-Contact-Adrien Mazarguil (EXTERNAL), dev, Andrew Rybchenko Hi Stephen and Ivan > -----Original Message----- > From: Stephen Hemminger <stephen@networkplumber.org> > Sent: Tuesday, January 4, 2022 11:56 PM > Subject: Re: Understanding Flow API action RSS > > On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) > Ivan Malov <ivan.malov@oktetlabs.ru> wrote: > > > Hi Stephen, > > > > On Tue, 4 Jan 2022, Stephen Hemminger wrote: > > > > > On Tue, 04 Jan 2022 13:41:55 +0100 > > > Thomas Monjalon <thomas@monjalon.net> wrote: > > > > > >> +Cc Ori Kam, rte_flow maintainer > > >> > > >> 29/12/2021 15:34, Ivan Malov: > > >>> Hi all, > > >>> > > >>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is > > >>> to provide "Queue indices to use". But it is unclear whether the order of > > >>> elements is meaningful or not. Does that matter? Can queue indices repeat? > > > > > > The order probably doesn't matter, it is like the RSS indirection table. > > > > Sorry, but RSS indirection table (RETA) assumes some structure. In it, > > queue indices can repeat, and the order is meaningful. In DPDK, RETA > > may comprise multiple "groups", each one comprising 64 entries. > > > > This 'queue' array in flow action RSS does not stick with the same > > terminology, it does not reuse the definition of RETA "group", etc. > > Just "queue indices to use". No definition of order, no structure. > > > > The API contract is not clear. Neither to users, nor to PMDs. > > From API in RSS the queues are simply the queue ID, order doesn't matter, Duplicating the queue may affect the the spread based on the HW/PMD. In common case each queue should appear only once and the PMD may duplicate entries to get the best performance. > > > > > > rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ] > > > > > > So you could play with multiple queues matching same hash value, but that > > > would be uncommon. > > > > > >>> An ethdev may have "global" RSS setting with an indirection table of some > > >>> fixed size (say, 512). In what comes to flow rules, does that size matter? > > > > > > Global RSS is only used if the incoming packet does not match any rte_flow > > > action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS > > > these take precedence. > > > > Yes, I know all of that. The question is how does the PMD select RETA size > > for this action? Can it select an arbitrary value? Or should it stick with > > the "global" one (eg. 512)? How does the user know the table size? > > > > If the user simply wants to spread traffic across the given queues, > > the effective table size is a don't care to them, and the existing > > API contract is fine. But if the user expects that certain packets > > hit some precise queues, they need to know the table size for that. > > Just like you said RSS simply spread the traffic to the given queues. If application wants to send traffic to some queue it should use the queue action. > > So, the question is whether the users should or should not build > > any expectations of the effective table size and, if they should, > > are they supposed to use the "global" table size for that? > > You are right this area is completely undocumented. Personally would really like > it if rte_flow had a reference software implementation and all the HW vendors > had to make sure their HW matched the SW reference version. But this a case > where the funding is all on the HW side, and no one has time or resources > to do a complete SW version.. > > A sane implementation would configure RSS indirection as across all > rx queues that were available when the device was started; ie all queues > that did not have deferred start set. Then the application would start/stop > queues and use rte_flow to reach them. > > But it doesn't appear the HW follows that model. > > > > >>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does > > >>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? > > > > > > No the default is always Toeplitz. This goes back to the original definition > > > of RSS which is in Microsoft NDIS and uses Toeplitz. > > > > Then why have a dedicated enum named TOEPLITZ? Also, once again, the > > documentation should be more specific to say which algorithm exactly > > this DEFAULT choice provides. Otherwise, it is very vague. > > > > > > > > DPDK should have more examples of using rte_flow, I have some samples > > > but they aren't that useful. > > > > > > > I could not agree more. Feel free to add/suggest what example are missing. > > > > Thanks, > > Ivan M. Best, Ori ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Understanding Flow API action RSS 2022-01-09 12:23 ` Ori Kam @ 2022-01-09 13:03 ` Ivan Malov 2022-01-10 9:54 ` Ori Kam 0 siblings, 1 reply; 10+ messages in thread From: Ivan Malov @ 2022-01-09 13:03 UTC (permalink / raw) To: Ori Kam Cc: Stephen Hemminger, NBU-Contact-Thomas Monjalon (EXTERNAL), NBU-Contact-Adrien Mazarguil (EXTERNAL), dev, Andrew Rybchenko Hi Ori, On Sun, 9 Jan 2022, Ori Kam wrote: > Hi Stephen and Ivan > >> -----Original Message----- >> From: Stephen Hemminger <stephen@networkplumber.org> >> Sent: Tuesday, January 4, 2022 11:56 PM >> Subject: Re: Understanding Flow API action RSS >> >> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) >> Ivan Malov <ivan.malov@oktetlabs.ru> wrote: >> >>> Hi Stephen, >>> >>> On Tue, 4 Jan 2022, Stephen Hemminger wrote: >>> >>>> On Tue, 04 Jan 2022 13:41:55 +0100 >>>> Thomas Monjalon <thomas@monjalon.net> wrote: >>>> >>>>> +Cc Ori Kam, rte_flow maintainer >>>>> >>>>> 29/12/2021 15:34, Ivan Malov: >>>>>> Hi all, >>>>>> >>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is >>>>>> to provide "Queue indices to use". But it is unclear whether the order of >>>>>> elements is meaningful or not. Does that matter? Can queue indices repeat? >>>> >>>> The order probably doesn't matter, it is like the RSS indirection table. >>> >>> Sorry, but RSS indirection table (RETA) assumes some structure. In it, >>> queue indices can repeat, and the order is meaningful. In DPDK, RETA >>> may comprise multiple "groups", each one comprising 64 entries. >>> >>> This 'queue' array in flow action RSS does not stick with the same >>> terminology, it does not reuse the definition of RETA "group", etc. >>> Just "queue indices to use". No definition of order, no structure. >>> >>> The API contract is not clear. Neither to users, nor to PMDs. >>> >> From API in RSS the queues are simply the queue ID, order doesn't matter, > Duplicating the queue may affect the the spread based on the HW/PMD. > In common case each queue should appear only once and the PMD may duplicate > entries to get the best performance. Look. In a DPDK PMD, one has "global" RSS table. Consider the following example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue indices may repeat. They may have different order: 1, 1, 0, 0, ... . The order is of great importance. If you send a packet to a DPDK-powered server, you can know in advance its hash value. Hence, you may strictly predict which RSS table entry this hash will point at. That predicts the target Rx queue. So the questions which one should attempt to clarify, are as follows: 1) Is the 'queue' array ordered? (Does the order of elements matter?) 2) Can its elements repeat? (*allowed* or *not allowed*?) > >>>> >>>> rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ] >>>> >>>> So you could play with multiple queues matching same hash value, but that >>>> would be uncommon. >>>> >>>>>> An ethdev may have "global" RSS setting with an indirection table of some >>>>>> fixed size (say, 512). In what comes to flow rules, does that size matter? >>>> >>>> Global RSS is only used if the incoming packet does not match any rte_flow >>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS >>>> these take precedence. >>> >>> Yes, I know all of that. The question is how does the PMD select RETA size >>> for this action? Can it select an arbitrary value? Or should it stick with >>> the "global" one (eg. 512)? How does the user know the table size? >>> >>> If the user simply wants to spread traffic across the given queues, >>> the effective table size is a don't care to them, and the existing >>> API contract is fine. But if the user expects that certain packets >>> hit some precise queues, they need to know the table size for that. >>> > Just like you said RSS simply spread the traffic to the given queues. Yes, to the given queues. The question is whether the 'queue' array has RETA properties (order matters; elements can repeat) or not. > If application wants to send traffic to some queue it should use the queue action. Yes, but that's not what I mean. Consider the following example. The user generates packets with random IP addresses at machine A. These packets hit DPDK at machine B. For a given *packet*, the sender (A) can compute its RSS hash in software. This will point out the RETA entry index. But, in order to predict the exact *queue* index, the sender has to know the table (its contents, its size). For a "global" DPDK RSS setting, the table can be easily obtained with an ethdev callback / API. Very simple. Fixed-size table, and it can be queried. But how does one obtain similar knowledge for RSS action? > >>> So, the question is whether the users should or should not build >>> any expectations of the effective table size and, if they should, >>> are they supposed to use the "global" table size for that? >> >> You are right this area is completely undocumented. Personally would really like >> it if rte_flow had a reference software implementation and all the HW vendors >> had to make sure their HW matched the SW reference version. But this a case >> where the funding is all on the HW side, and no one has time or resources >> to do a complete SW version.. >> >> A sane implementation would configure RSS indirection as across all >> rx queues that were available when the device was started; ie all queues >> that did not have deferred start set. Then the application would start/stop >> queues and use rte_flow to reach them. >> >> But it doesn't appear the HW follows that model. >> >> >>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does >>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? >>>> >>>> No the default is always Toeplitz. This goes back to the original definition >>>> of RSS which is in Microsoft NDIS and uses Toeplitz. >>> >>> Then why have a dedicated enum named TOEPLITZ? Also, once again, the >>> documentation should be more specific to say which algorithm exactly >>> this DEFAULT choice provides. Otherwise, it is very vague. >>> >>>> >>>> DPDK should have more examples of using rte_flow, I have some samples >>>> but they aren't that useful. >>>> >>> >>> I could not agree more. > > Feel free to add/suggest what example are missing. > >>> >>> Thanks, >>> Ivan M. > > Best, > Ori > ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Understanding Flow API action RSS 2022-01-09 13:03 ` Ivan Malov @ 2022-01-10 9:54 ` Ori Kam 2022-01-10 15:04 ` Ivan Malov 0 siblings, 1 reply; 10+ messages in thread From: Ori Kam @ 2022-01-10 9:54 UTC (permalink / raw) To: Ivan Malov Cc: Stephen Hemminger, NBU-Contact-Thomas Monjalon (EXTERNAL), NBU-Contact-Adrien Mazarguil (EXTERNAL), dev, Andrew Rybchenko Hi Ivan, > -----Original Message----- > From: Ivan Malov <ivan.malov@oktetlabs.ru> > Sent: Sunday, January 9, 2022 3:03 PM > Subject: RE: Understanding Flow API action RSS > > Hi Ori, > > On Sun, 9 Jan 2022, Ori Kam wrote: > > > Hi Stephen and Ivan > > > >> -----Original Message----- > >> From: Stephen Hemminger <stephen@networkplumber.org> > >> Sent: Tuesday, January 4, 2022 11:56 PM > >> Subject: Re: Understanding Flow API action RSS > >> > >> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) > >> Ivan Malov <ivan.malov@oktetlabs.ru> wrote: > >> > >>> Hi Stephen, > >>> > >>> On Tue, 4 Jan 2022, Stephen Hemminger wrote: > >>> > >>>> On Tue, 04 Jan 2022 13:41:55 +0100 > >>>> Thomas Monjalon <thomas@monjalon.net> wrote: > >>>> > >>>>> +Cc Ori Kam, rte_flow maintainer > >>>>> > >>>>> 29/12/2021 15:34, Ivan Malov: > >>>>>> Hi all, > >>>>>> > >>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is > >>>>>> to provide "Queue indices to use". But it is unclear whether the order of > >>>>>> elements is meaningful or not. Does that matter? Can queue indices repeat? > >>>> > >>>> The order probably doesn't matter, it is like the RSS indirection table. > >>> > >>> Sorry, but RSS indirection table (RETA) assumes some structure. In it, > >>> queue indices can repeat, and the order is meaningful. In DPDK, RETA > >>> may comprise multiple "groups", each one comprising 64 entries. > >>> > >>> This 'queue' array in flow action RSS does not stick with the same > >>> terminology, it does not reuse the definition of RETA "group", etc. > >>> Just "queue indices to use". No definition of order, no structure. > >>> > >>> The API contract is not clear. Neither to users, nor to PMDs. > >>> > >> From API in RSS the queues are simply the queue ID, order doesn't matter, > > Duplicating the queue may affect the the spread based on the HW/PMD. > > In common case each queue should appear only once and the PMD may duplicate > > entries to get the best performance. > > Look. In a DPDK PMD, one has "global" RSS table. Consider the following > example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue > indices may repeat. They may have different order: 1, 1, 0, 0, ... . > The order is of great importance. If you send a packet to a > DPDK-powered server, you can know in advance its hash value. > Hence, you may strictly predict which RSS table entry this > hash will point at. That predicts the target Rx queue. > > So the questions which one should attempt to clarify, are as follows: > 1) Is the 'queue' array ordered? (Does the order of elements matter?) > 2) Can its elements repeat? (*allowed* or *not allowed*?) > From API point of view the array is ordered, and may have repeating elements. > > > >>>> > >>>> rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ] > >>>> > >>>> So you could play with multiple queues matching same hash value, but that > >>>> would be uncommon. > >>>> > >>>>>> An ethdev may have "global" RSS setting with an indirection table of some > >>>>>> fixed size (say, 512). In what comes to flow rules, does that size matter? > >>>> > >>>> Global RSS is only used if the incoming packet does not match any rte_flow > >>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS > >>>> these take precedence. > >>> > >>> Yes, I know all of that. The question is how does the PMD select RETA size > >>> for this action? Can it select an arbitrary value? Or should it stick with > >>> the "global" one (eg. 512)? How does the user know the table size? > >>> > >>> If the user simply wants to spread traffic across the given queues, > >>> the effective table size is a don't care to them, and the existing > >>> API contract is fine. But if the user expects that certain packets > >>> hit some precise queues, they need to know the table size for that. > >>> > > Just like you said RSS simply spread the traffic to the given queues. > > Yes, to the given queues. The question is whether the 'queue' array > has RETA properties (order matters; elements can repeat) or not. > Yes order matters and elements can repeat. > > If application wants to send traffic to some queue it should use the queue action. > > Yes, but that's not what I mean. Consider the following example. The user > generates packets with random IP addresses at machine A. These packets > hit DPDK at machine B. For a given *packet*, the sender (A) can > compute its RSS hash in software. This will point out the RETA > entry index. But, in order to predict the exact *queue* index, > the sender has to know the table (its contents, its size). > Why do application need this info? > For a "global" DPDK RSS setting, the table can be easily obtained with > an ethdev callback / API. Very simple. Fixed-size table, and it can > be queried. But how does one obtain similar knowledge for RSS action? > The RSS action was designed to allow balanced traffic spread. The size of the reta is PMD dependent, in some PMD the size will be the number of queues in others it will be the number of queues but in power of 2, so if the app requested 8 queues the reta will also be 8. In any case PMD should use the given order, if the PMD needs to expend it should cycle on the application requested queues in the order they were given. > > > >>> So, the question is whether the users should or should not build > >>> any expectations of the effective table size and, if they should, > >>> are they supposed to use the "global" table size for that? > >> > >> You are right this area is completely undocumented. Personally would really like > >> it if rte_flow had a reference software implementation and all the HW vendors > >> had to make sure their HW matched the SW reference version. But this a case > >> where the funding is all on the HW side, and no one has time or resources > >> to do a complete SW version.. > >> > >> A sane implementation would configure RSS indirection as across all > >> rx queues that were available when the device was started; ie all queues > >> that did not have deferred start set. Then the application would start/stop > >> queues and use rte_flow to reach them. > >> > >> But it doesn't appear the HW follows that model. > >> > >> > >>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does > >>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? > >>>> > >>>> No the default is always Toeplitz. This goes back to the original definition > >>>> of RSS which is in Microsoft NDIS and uses Toeplitz. > >>> > >>> Then why have a dedicated enum named TOEPLITZ? Also, once again, the > >>> documentation should be more specific to say which algorithm exactly > >>> this DEFAULT choice provides. Otherwise, it is very vague. > >>> > >>>> > >>>> DPDK should have more examples of using rte_flow, I have some samples > >>>> but they aren't that useful. > >>>> > >>> > >>> I could not agree more. > > > > Feel free to add/suggest what example are missing. > > > >>> > >>> Thanks, > >>> Ivan M. > > > > Best, > > Ori > > Best, Ori ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Understanding Flow API action RSS 2022-01-10 9:54 ` Ori Kam @ 2022-01-10 15:04 ` Ivan Malov 2022-01-10 17:18 ` Ori Kam 0 siblings, 1 reply; 10+ messages in thread From: Ivan Malov @ 2022-01-10 15:04 UTC (permalink / raw) To: Ori Kam Cc: Stephen Hemminger, NBU-Contact-Thomas Monjalon (EXTERNAL), NBU-Contact-Adrien Mazarguil (EXTERNAL), dev, Andrew Rybchenko Hi Ori, Many-many thanks for your commentary. The nature of 'queue' array in flow action RSS is clear now. I hope PMD vendors and API users share this vision, too. Propably, this should be properly documented. We'll see what we cad do in that direction. Please see one more question below. On Mon, 10 Jan 2022, Ori Kam wrote: > Hi Ivan, > >> -----Original Message----- >> From: Ivan Malov <ivan.malov@oktetlabs.ru> >> Sent: Sunday, January 9, 2022 3:03 PM >> Subject: RE: Understanding Flow API action RSS >> >> Hi Ori, >> >> On Sun, 9 Jan 2022, Ori Kam wrote: >> >>> Hi Stephen and Ivan >>> >>>> -----Original Message----- >>>> From: Stephen Hemminger <stephen@networkplumber.org> >>>> Sent: Tuesday, January 4, 2022 11:56 PM >>>> Subject: Re: Understanding Flow API action RSS >>>> >>>> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) >>>> Ivan Malov <ivan.malov@oktetlabs.ru> wrote: >>>> >>>>> Hi Stephen, >>>>> >>>>> On Tue, 4 Jan 2022, Stephen Hemminger wrote: >>>>> >>>>>> On Tue, 04 Jan 2022 13:41:55 +0100 >>>>>> Thomas Monjalon <thomas@monjalon.net> wrote: >>>>>> >>>>>>> +Cc Ori Kam, rte_flow maintainer >>>>>>> >>>>>>> 29/12/2021 15:34, Ivan Malov: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is >>>>>>>> to provide "Queue indices to use". But it is unclear whether the order of >>>>>>>> elements is meaningful or not. Does that matter? Can queue indices repeat? >>>>>> >>>>>> The order probably doesn't matter, it is like the RSS indirection table. >>>>> >>>>> Sorry, but RSS indirection table (RETA) assumes some structure. In it, >>>>> queue indices can repeat, and the order is meaningful. In DPDK, RETA >>>>> may comprise multiple "groups", each one comprising 64 entries. >>>>> >>>>> This 'queue' array in flow action RSS does not stick with the same >>>>> terminology, it does not reuse the definition of RETA "group", etc. >>>>> Just "queue indices to use". No definition of order, no structure. >>>>> >>>>> The API contract is not clear. Neither to users, nor to PMDs. >>>>> >>>> From API in RSS the queues are simply the queue ID, order doesn't matter, >>> Duplicating the queue may affect the the spread based on the HW/PMD. >>> In common case each queue should appear only once and the PMD may duplicate >>> entries to get the best performance. >> >> Look. In a DPDK PMD, one has "global" RSS table. Consider the following >> example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue >> indices may repeat. They may have different order: 1, 1, 0, 0, ... . >> The order is of great importance. If you send a packet to a >> DPDK-powered server, you can know in advance its hash value. >> Hence, you may strictly predict which RSS table entry this >> hash will point at. That predicts the target Rx queue. >> >> So the questions which one should attempt to clarify, are as follows: >> 1) Is the 'queue' array ordered? (Does the order of elements matter?) >> 2) Can its elements repeat? (*allowed* or *not allowed*?) >> >> From API point of view the array is ordered, and may have repeating elements. > >>> >>>>>> >>>>>> rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ] >>>>>> >>>>>> So you could play with multiple queues matching same hash value, but that >>>>>> would be uncommon. >>>>>> >>>>>>>> An ethdev may have "global" RSS setting with an indirection table of some >>>>>>>> fixed size (say, 512). In what comes to flow rules, does that size matter? >>>>>> >>>>>> Global RSS is only used if the incoming packet does not match any rte_flow >>>>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS >>>>>> these take precedence. >>>>> >>>>> Yes, I know all of that. The question is how does the PMD select RETA size >>>>> for this action? Can it select an arbitrary value? Or should it stick with >>>>> the "global" one (eg. 512)? How does the user know the table size? >>>>> >>>>> If the user simply wants to spread traffic across the given queues, >>>>> the effective table size is a don't care to them, and the existing >>>>> API contract is fine. But if the user expects that certain packets >>>>> hit some precise queues, they need to know the table size for that. >>>>> >>> Just like you said RSS simply spread the traffic to the given queues. >> >> Yes, to the given queues. The question is whether the 'queue' array >> has RETA properties (order matters; elements can repeat) or not. >> > > Yes order matters and elements can repeat. > >>> If application wants to send traffic to some queue it should use the queue action. >> >> Yes, but that's not what I mean. Consider the following example. The user >> generates packets with random IP addresses at machine A. These packets >> hit DPDK at machine B. For a given *packet*, the sender (A) can >> compute its RSS hash in software. This will point out the RETA >> entry index. But, in order to predict the exact *queue* index, >> the sender has to know the table (its contents, its size). >> > Why do application need this info? > >> For a "global" DPDK RSS setting, the table can be easily obtained with >> an ethdev callback / API. Very simple. Fixed-size table, and it can >> be queried. But how does one obtain similar knowledge for RSS action? >> > The RSS action was designed to allow balanced traffic spread. > The size of the reta is PMD dependent, in some PMD the size will be > the number of queues in others it will be the number of queues but in > power of 2, so if the app requested 8 queues the reta will also be 8. > In any case PMD should use the given order, if the PMD needs to expend > it should cycle on the application requested queues in the order they were given. > > >>> >>>>> So, the question is whether the users should or should not build >>>>> any expectations of the effective table size and, if they should, >>>>> are they supposed to use the "global" table size for that? >>>> >>>> You are right this area is completely undocumented. Personally would really like >>>> it if rte_flow had a reference software implementation and all the HW vendors >>>> had to make sure their HW matched the SW reference version. But this a case >>>> where the funding is all on the HW side, and no one has time or resources >>>> to do a complete SW version.. >>>> >>>> A sane implementation would configure RSS indirection as across all >>>> rx queues that were available when the device was started; ie all queues >>>> that did not have deferred start set. Then the application would start/stop >>>> queues and use rte_flow to reach them. >>>> >>>> But it doesn't appear the HW follows that model. >>>> >>>> >>>>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does >>>>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? What do you think about the above question? In my opinion, DEFAULT should let the PMD select whatever hash function / algorithm it may want to select. Just some vendor-specific optimal choice. If the user wants exactly Toeplitz / "standard RSS hash" behaviour, they can always specify enum TOEPLITZ. And the PMD must either comply or reject. What do you think? Are we on the same page? >>>>>> >>>>>> No the default is always Toeplitz. This goes back to the original definition >>>>>> of RSS which is in Microsoft NDIS and uses Toeplitz. >>>>> >>>>> Then why have a dedicated enum named TOEPLITZ? Also, once again, the >>>>> documentation should be more specific to say which algorithm exactly >>>>> this DEFAULT choice provides. Otherwise, it is very vague. >>>>> >>>>>> >>>>>> DPDK should have more examples of using rte_flow, I have some samples >>>>>> but they aren't that useful. >>>>>> >>>>> >>>>> I could not agree more. >>> >>> Feel free to add/suggest what example are missing. >>> >>>>> >>>>> Thanks, >>>>> Ivan M. >>> >>> Best, >>> Ori >>> > Best, > Ori > Best regards, Ivan M. ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: Understanding Flow API action RSS 2022-01-10 15:04 ` Ivan Malov @ 2022-01-10 17:18 ` Ori Kam 0 siblings, 0 replies; 10+ messages in thread From: Ori Kam @ 2022-01-10 17:18 UTC (permalink / raw) To: Ivan Malov Cc: Stephen Hemminger, NBU-Contact-Thomas Monjalon (EXTERNAL), NBU-Contact-Adrien Mazarguil (EXTERNAL), dev, Andrew Rybchenko Hi Ian, > -----Original Message----- > From: Ivan Malov <ivan.malov@oktetlabs.ru> > Subject: RE: Understanding Flow API action RSS > > Hi Ori, > > Many-many thanks for your commentary. > > The nature of 'queue' array in flow action RSS is clear now. > I hope PMD vendors and API users share this vision, too. > Propably, this should be properly documented. > We'll see what we cad do in that direction. > > Please see one more question below. > > On Mon, 10 Jan 2022, Ori Kam wrote: > > > Hi Ivan, > > > >> -----Original Message----- > >> From: Ivan Malov <ivan.malov@oktetlabs.ru> > >> Sent: Sunday, January 9, 2022 3:03 PM > >> Subject: RE: Understanding Flow API action RSS > >> > >> Hi Ori, > >> > >> On Sun, 9 Jan 2022, Ori Kam wrote: > >> > >>> Hi Stephen and Ivan > >>> > >>>> -----Original Message----- > >>>> From: Stephen Hemminger <stephen@networkplumber.org> > >>>> Sent: Tuesday, January 4, 2022 11:56 PM > >>>> Subject: Re: Understanding Flow API action RSS > >>>> > >>>> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK) > >>>> Ivan Malov <ivan.malov@oktetlabs.ru> wrote: > >>>> > >>>>> Hi Stephen, > >>>>> > >>>>> On Tue, 4 Jan 2022, Stephen Hemminger wrote: > >>>>> > >>>>>> On Tue, 04 Jan 2022 13:41:55 +0100 > >>>>>> Thomas Monjalon <thomas@monjalon.net> wrote: > >>>>>> > >>>>>>> +Cc Ori Kam, rte_flow maintainer > >>>>>>> > >>>>>>> 29/12/2021 15:34, Ivan Malov: > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is > >>>>>>>> to provide "Queue indices to use". But it is unclear whether the order of > >>>>>>>> elements is meaningful or not. Does that matter? Can queue indices repeat? > >>>>>> > >>>>>> The order probably doesn't matter, it is like the RSS indirection table. > >>>>> > >>>>> Sorry, but RSS indirection table (RETA) assumes some structure. In it, > >>>>> queue indices can repeat, and the order is meaningful. In DPDK, RETA > >>>>> may comprise multiple "groups", each one comprising 64 entries. > >>>>> > >>>>> This 'queue' array in flow action RSS does not stick with the same > >>>>> terminology, it does not reuse the definition of RETA "group", etc. > >>>>> Just "queue indices to use". No definition of order, no structure. > >>>>> > >>>>> The API contract is not clear. Neither to users, nor to PMDs. > >>>>> > >>>> From API in RSS the queues are simply the queue ID, order doesn't matter, > >>> Duplicating the queue may affect the the spread based on the HW/PMD. > >>> In common case each queue should appear only once and the PMD may duplicate > >>> entries to get the best performance. > >> > >> Look. In a DPDK PMD, one has "global" RSS table. Consider the following > >> example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue > >> indices may repeat. They may have different order: 1, 1, 0, 0, ... . > >> The order is of great importance. If you send a packet to a > >> DPDK-powered server, you can know in advance its hash value. > >> Hence, you may strictly predict which RSS table entry this > >> hash will point at. That predicts the target Rx queue. > >> > >> So the questions which one should attempt to clarify, are as follows: > >> 1) Is the 'queue' array ordered? (Does the order of elements matter?) > >> 2) Can its elements repeat? (*allowed* or *not allowed*?) > >> > >> From API point of view the array is ordered, and may have repeating elements. > > > >>> > >>>>>> > >>>>>> rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ] > >>>>>> > >>>>>> So you could play with multiple queues matching same hash value, but that > >>>>>> would be uncommon. > >>>>>> > >>>>>>>> An ethdev may have "global" RSS setting with an indirection table of some > >>>>>>>> fixed size (say, 512). In what comes to flow rules, does that size matter? > >>>>>> > >>>>>> Global RSS is only used if the incoming packet does not match any rte_flow > >>>>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS > >>>>>> these take precedence. > >>>>> > >>>>> Yes, I know all of that. The question is how does the PMD select RETA size > >>>>> for this action? Can it select an arbitrary value? Or should it stick with > >>>>> the "global" one (eg. 512)? How does the user know the table size? > >>>>> > >>>>> If the user simply wants to spread traffic across the given queues, > >>>>> the effective table size is a don't care to them, and the existing > >>>>> API contract is fine. But if the user expects that certain packets > >>>>> hit some precise queues, they need to know the table size for that. > >>>>> > >>> Just like you said RSS simply spread the traffic to the given queues. > >> > >> Yes, to the given queues. The question is whether the 'queue' array > >> has RETA properties (order matters; elements can repeat) or not. > >> > > > > Yes order matters and elements can repeat. > > > >>> If application wants to send traffic to some queue it should use the queue action. > >> > >> Yes, but that's not what I mean. Consider the following example. The user > >> generates packets with random IP addresses at machine A. These packets > >> hit DPDK at machine B. For a given *packet*, the sender (A) can > >> compute its RSS hash in software. This will point out the RETA > >> entry index. But, in order to predict the exact *queue* index, > >> the sender has to know the table (its contents, its size). > >> > > Why do application need this info? > > > >> For a "global" DPDK RSS setting, the table can be easily obtained with > >> an ethdev callback / API. Very simple. Fixed-size table, and it can > >> be queried. But how does one obtain similar knowledge for RSS action? > >> > > The RSS action was designed to allow balanced traffic spread. > > The size of the reta is PMD dependent, in some PMD the size will be > > the number of queues in others it will be the number of queues but in > > power of 2, so if the app requested 8 queues the reta will also be 8. > > In any case PMD should use the given order, if the PMD needs to expend > > it should cycle on the application requested queues in the order they were given. > > > > > >>> > >>>>> So, the question is whether the users should or should not build > >>>>> any expectations of the effective table size and, if they should, > >>>>> are they supposed to use the "global" table size for that? > >>>> > >>>> You are right this area is completely undocumented. Personally would really like > >>>> it if rte_flow had a reference software implementation and all the HW vendors > >>>> had to make sure their HW matched the SW reference version. But this a case > >>>> where the funding is all on the HW side, and no one has time or resources > >>>> to do a complete SW version.. > >>>> > >>>> A sane implementation would configure RSS indirection as across all > >>>> rx queues that were available when the device was started; ie all queues > >>>> that did not have deferred start set. Then the application would start/stop > >>>> queues and use rte_flow to reach them. > >>>> > >>>> But it doesn't appear the HW follows that model. > >>>> > >>>> > >>>>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does > >>>>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm? > > What do you think about the above question? In my opinion, DEFAULT should > let the PMD select whatever hash function / algorithm it may want to > select. Just some vendor-specific optimal choice. > > If the user wants exactly Toeplitz / "standard RSS hash" behaviour, > they can always specify enum TOEPLITZ. And the PMD must either > comply or reject. What do you think? Are we on the same page? > Fully agree with you. The same goes if the user doesn't supply the key, PMD should select some default value. > >>>>>> > >>>>>> No the default is always Toeplitz. This goes back to the original definition > >>>>>> of RSS which is in Microsoft NDIS and uses Toeplitz. > >>>>> > >>>>> Then why have a dedicated enum named TOEPLITZ? Also, once again, the > >>>>> documentation should be more specific to say which algorithm exactly > >>>>> this DEFAULT choice provides. Otherwise, it is very vague. > >>>>> > >>>>>> > >>>>>> DPDK should have more examples of using rte_flow, I have some samples > >>>>>> but they aren't that useful. > >>>>>> > >>>>> > >>>>> I could not agree more. > >>> > >>> Feel free to add/suggest what example are missing. > >>> > >>>>> > >>>>> Thanks, > >>>>> Ivan M. > >>> > >>> Best, > >>> Ori > >>> > > Best, > > Ori > > > > Best regards, > Ivan M. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-01-10 17:18 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-29 14:34 Understanding Flow API action RSS Ivan Malov 2022-01-04 12:41 ` Thomas Monjalon 2022-01-04 16:54 ` Stephen Hemminger 2022-01-04 18:29 ` Ivan Malov 2022-01-04 21:56 ` Stephen Hemminger 2022-01-09 12:23 ` Ori Kam 2022-01-09 13:03 ` Ivan Malov 2022-01-10 9:54 ` Ori Kam 2022-01-10 15:04 ` Ivan Malov 2022-01-10 17:18 ` Ori Kam
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).