DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ori Kam <orika@nvidia.com>
To: Ivan Malov <ivan.malov@oktetlabs.ru>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
	"NBU-Contact-Thomas Monjalon (EXTERNAL)" <thomas@monjalon.net>,
	"NBU-Contact-Adrien Mazarguil (EXTERNAL)"
	<adrien.mazarguil@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Subject: RE: Understanding Flow API action RSS
Date: Mon, 10 Jan 2022 09:54:35 +0000	[thread overview]
Message-ID: <MW2PR12MB466626B11C75EDDA4D4D4173D6509@MW2PR12MB4666.namprd12.prod.outlook.com> (raw)
In-Reply-To: <1fa28b5-22f4-36f0-a4fe-2ceedad4434@oktetlabs.ru>

Hi Ivan,

> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Sent: Sunday, January 9, 2022 3:03 PM
> Subject: RE: Understanding Flow API action RSS
> 
> Hi Ori,
> 
> On Sun, 9 Jan 2022, Ori Kam wrote:
> 
> > Hi Stephen and Ivan
> >
> >> -----Original Message-----
> >> From: Stephen Hemminger <stephen@networkplumber.org>
> >> Sent: Tuesday, January 4, 2022 11:56 PM
> >> Subject: Re: Understanding Flow API action RSS
> >>
> >> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK)
> >> Ivan Malov <ivan.malov@oktetlabs.ru> wrote:
> >>
> >>> Hi Stephen,
> >>>
> >>> On Tue, 4 Jan 2022, Stephen Hemminger wrote:
> >>>
> >>>> On Tue, 04 Jan 2022 13:41:55 +0100
> >>>> Thomas Monjalon <thomas@monjalon.net> wrote:
> >>>>
> >>>>> +Cc Ori Kam, rte_flow maintainer
> >>>>>
> >>>>> 29/12/2021 15:34, Ivan Malov:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is
> >>>>>> to provide "Queue indices to use". But it is unclear whether the order of
> >>>>>> elements is meaningful or not. Does that matter? Can queue indices repeat?
> >>>>
> >>>> The order probably doesn't matter, it is like the RSS indirection table.
> >>>
> >>> Sorry, but RSS indirection table (RETA) assumes some structure. In it,
> >>> queue indices can repeat, and the order is meaningful. In DPDK, RETA
> >>> may comprise multiple "groups", each one comprising 64 entries.
> >>>
> >>> This 'queue' array in flow action RSS does not stick with the same
> >>> terminology, it does not reuse the definition of RETA "group", etc.
> >>> Just "queue indices to use". No definition of order, no structure.
> >>>
> >>> The API contract is not clear. Neither to users, nor to PMDs.
> >>>
> >> From API in RSS the queues are simply the queue ID, order doesn't matter,
> > Duplicating the queue may affect the the spread based on the HW/PMD.
> > In common case each queue should appear only once and the PMD may duplicate
> > entries to get the best performance.
> 
> Look. In a DPDK PMD, one has "global" RSS table. Consider the following
> example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue
> indices may repeat. They may have different order: 1, 1, 0, 0, ... .
> The order is of great importance. If you send a packet to a
> DPDK-powered server, you can know in advance its hash value.
> Hence, you may strictly predict which RSS table entry this
> hash will point at. That predicts the target Rx queue.
> 
> So the questions which one should attempt to clarify, are as follows:
> 1) Is the 'queue' array ordered? (Does the order of elements matter?)
> 2) Can its elements repeat? (*allowed* or *not allowed*?)
> 
From API point of view the array is ordered, and may have repeating elements.

> >
> >>>>
> >>>>    rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ]
> >>>>
> >>>> So you could play with multiple queues matching same hash value, but that
> >>>> would be uncommon.
> >>>>
> >>>>>> An ethdev may have "global" RSS setting with an indirection table of some
> >>>>>> fixed size (say, 512). In what comes to flow rules, does that size matter?
> >>>>
> >>>> Global RSS is only used if the incoming packet does not match any rte_flow
> >>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS
> >>>> these take precedence.
> >>>
> >>> Yes, I know all of that. The question is how does the PMD select RETA size
> >>> for this action? Can it select an arbitrary value? Or should it stick with
> >>> the "global" one (eg. 512)? How does the user know the table size?
> >>>
> >>> If the user simply wants to spread traffic across the given queues,
> >>> the effective table size is a don't care to them, and the existing
> >>> API contract is fine. But if the user expects that certain packets
> >>> hit some precise queues, they need to know the table size for that.
> >>>
> > Just like you said RSS simply spread the traffic to the given queues.
> 
> Yes, to the given queues. The question is whether the 'queue' array
> has RETA properties (order matters; elements can repeat) or not.
> 

Yes order matters and elements can repeat.

> > If application wants to send traffic to some queue it should use the queue action.
> 
> Yes, but that's not what I mean. Consider the following example. The user
> generates packets with random IP addresses at machine A. These packets
> hit DPDK at machine B. For a given *packet*, the sender (A) can
> compute its RSS hash in software. This will point out the RETA
> entry index. But, in order to predict the exact *queue* index,
> the sender has to know the table (its contents, its size).
> 
Why do application need this info?

> For a "global" DPDK RSS setting, the table can be easily obtained with
> an ethdev callback / API. Very simple. Fixed-size table, and it can
> be queried. But how does one obtain similar knowledge for RSS action?
> 
The RSS action was designed to allow balanced traffic spread.
The size of the reta is PMD dependent, in some PMD the size will be
the number of queues in others it will be the number of queues but in
power of 2, so if the app requested 8 queues the reta will also be 8.
In any case PMD should use the given order, if the PMD needs to expend
it should cycle on the application requested queues in the order they were given.


> >
> >>> So, the question is whether the users should or should not build
> >>> any expectations of the effective table size and, if they should,
> >>> are they supposed to use the "global" table size for that?
> >>
> >> You are right this area is completely undocumented. Personally would really like
> >> it if rte_flow had a reference software implementation and all the HW vendors
> >> had to make sure their HW matched the SW reference version. But this a case
> >> where the funding is all on the HW side, and no one has time or resources
> >> to do a complete SW version..
> >>
> >> A sane implementation would configure RSS indirection as across all
> >> rx queues that were available when the device was started; ie all queues
> >> that did not have deferred start set. Then the application would start/stop
> >> queues and use rte_flow to reach them.
> >>
> >> But it doesn't appear the HW follows that model.
> >>
> >>
> >>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does
> >>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm?
> >>>>
> >>>> No the default is always Toeplitz.  This goes back to the original definition
> >>>> of RSS which is in Microsoft NDIS and uses Toeplitz.
> >>>
> >>> Then why have a dedicated enum named TOEPLITZ? Also, once again, the
> >>> documentation should be more specific to say which algorithm exactly
> >>> this DEFAULT choice provides. Otherwise, it is very vague.
> >>>
> >>>>
> >>>> DPDK should have more examples of using rte_flow, I have some samples
> >>>> but they aren't that useful.
> >>>>
> >>>
> >>> I could not agree more.
> >
> > Feel free to add/suggest what example are missing.
> >
> >>>
> >>> Thanks,
> >>> Ivan M.
> >
> > Best,
> > Ori
> >
Best,
Ori

  reply	other threads:[~2022-01-10  9:54 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-29 14:34 Ivan Malov
2022-01-04 12:41 ` Thomas Monjalon
2022-01-04 16:54   ` Stephen Hemminger
2022-01-04 18:29     ` Ivan Malov
2022-01-04 21:56       ` Stephen Hemminger
2022-01-09 12:23         ` Ori Kam
2022-01-09 13:03           ` Ivan Malov
2022-01-10  9:54             ` Ori Kam [this message]
2022-01-10 15:04               ` Ivan Malov
2022-01-10 17:18                 ` Ori Kam

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW2PR12MB466626B11C75EDDA4D4D4173D6509@MW2PR12MB4666.namprd12.prod.outlook.com \
    --to=orika@nvidia.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=ivan.malov@oktetlabs.ru \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).