DPDK patches and discussions
 help / color / mirror / Atom feed
From: Ori Kam <orika@nvidia.com>
To: Ivan Malov <ivan.malov@oktetlabs.ru>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
	"NBU-Contact-Thomas Monjalon (EXTERNAL)" <thomas@monjalon.net>,
	"NBU-Contact-Adrien Mazarguil (EXTERNAL)"
	<adrien.mazarguil@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>,
	Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Subject: RE: Understanding Flow API action RSS
Date: Mon, 10 Jan 2022 17:18:44 +0000	[thread overview]
Message-ID: <MW2PR12MB4666ACD30EFF3C1A52947B99D6509@MW2PR12MB4666.namprd12.prod.outlook.com> (raw)
In-Reply-To: <37111834-aecb-ac17-1059-177287a1507e@oktetlabs.ru>

Hi Ian,


> -----Original Message-----
> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> Subject: RE: Understanding Flow API action RSS
> 
> Hi Ori,
> 
> Many-many thanks for your commentary.
> 
> The nature of 'queue' array in flow action RSS is clear now.
> I hope PMD vendors and API users share this vision, too.
> Propably, this should be properly documented.
> We'll see what we cad do in that direction.
> 
> Please see one more question below.
> 
> On Mon, 10 Jan 2022, Ori Kam wrote:
> 
> > Hi Ivan,
> >
> >> -----Original Message-----
> >> From: Ivan Malov <ivan.malov@oktetlabs.ru>
> >> Sent: Sunday, January 9, 2022 3:03 PM
> >> Subject: RE: Understanding Flow API action RSS
> >>
> >> Hi Ori,
> >>
> >> On Sun, 9 Jan 2022, Ori Kam wrote:
> >>
> >>> Hi Stephen and Ivan
> >>>
> >>>> -----Original Message-----
> >>>> From: Stephen Hemminger <stephen@networkplumber.org>
> >>>> Sent: Tuesday, January 4, 2022 11:56 PM
> >>>> Subject: Re: Understanding Flow API action RSS
> >>>>
> >>>> On Tue, 4 Jan 2022 21:29:14 +0300 (MSK)
> >>>> Ivan Malov <ivan.malov@oktetlabs.ru> wrote:
> >>>>
> >>>>> Hi Stephen,
> >>>>>
> >>>>> On Tue, 4 Jan 2022, Stephen Hemminger wrote:
> >>>>>
> >>>>>> On Tue, 04 Jan 2022 13:41:55 +0100
> >>>>>> Thomas Monjalon <thomas@monjalon.net> wrote:
> >>>>>>
> >>>>>>> +Cc Ori Kam, rte_flow maintainer
> >>>>>>>
> >>>>>>> 29/12/2021 15:34, Ivan Malov:
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> In 'rte_flow.h', there is 'struct rte_flow_action_rss'. In it, 'queue' is
> >>>>>>>> to provide "Queue indices to use". But it is unclear whether the order of
> >>>>>>>> elements is meaningful or not. Does that matter? Can queue indices repeat?
> >>>>>>
> >>>>>> The order probably doesn't matter, it is like the RSS indirection table.
> >>>>>
> >>>>> Sorry, but RSS indirection table (RETA) assumes some structure. In it,
> >>>>> queue indices can repeat, and the order is meaningful. In DPDK, RETA
> >>>>> may comprise multiple "groups", each one comprising 64 entries.
> >>>>>
> >>>>> This 'queue' array in flow action RSS does not stick with the same
> >>>>> terminology, it does not reuse the definition of RETA "group", etc.
> >>>>> Just "queue indices to use". No definition of order, no structure.
> >>>>>
> >>>>> The API contract is not clear. Neither to users, nor to PMDs.
> >>>>>
> >>>> From API in RSS the queues are simply the queue ID, order doesn't matter,
> >>> Duplicating the queue may affect the the spread based on the HW/PMD.
> >>> In common case each queue should appear only once and the PMD may duplicate
> >>> entries to get the best performance.
> >>
> >> Look. In a DPDK PMD, one has "global" RSS table. Consider the following
> >> example: 0, 0, 1, 1, 2, 2, 3, 3 ... and so on. As you may see, queue
> >> indices may repeat. They may have different order: 1, 1, 0, 0, ... .
> >> The order is of great importance. If you send a packet to a
> >> DPDK-powered server, you can know in advance its hash value.
> >> Hence, you may strictly predict which RSS table entry this
> >> hash will point at. That predicts the target Rx queue.
> >>
> >> So the questions which one should attempt to clarify, are as follows:
> >> 1) Is the 'queue' array ordered? (Does the order of elements matter?)
> >> 2) Can its elements repeat? (*allowed* or *not allowed*?)
> >>
> >> From API point of view the array is ordered, and may have repeating elements.
> >
> >>>
> >>>>>>
> >>>>>>    rx queue = RSS_indirection_table[ RSS_hash_value % RSS_indirection_table_size ]
> >>>>>>
> >>>>>> So you could play with multiple queues matching same hash value, but that
> >>>>>> would be uncommon.
> >>>>>>
> >>>>>>>> An ethdev may have "global" RSS setting with an indirection table of some
> >>>>>>>> fixed size (say, 512). In what comes to flow rules, does that size matter?
> >>>>>>
> >>>>>> Global RSS is only used if the incoming packet does not match any rte_flow
> >>>>>> action. If there is a a RTE_FLOW_ACTION_TYPE_QUEUE or RTE_FLOW_ACTION_TYPE_RSS
> >>>>>> these take precedence.
> >>>>>
> >>>>> Yes, I know all of that. The question is how does the PMD select RETA size
> >>>>> for this action? Can it select an arbitrary value? Or should it stick with
> >>>>> the "global" one (eg. 512)? How does the user know the table size?
> >>>>>
> >>>>> If the user simply wants to spread traffic across the given queues,
> >>>>> the effective table size is a don't care to them, and the existing
> >>>>> API contract is fine. But if the user expects that certain packets
> >>>>> hit some precise queues, they need to know the table size for that.
> >>>>>
> >>> Just like you said RSS simply spread the traffic to the given queues.
> >>
> >> Yes, to the given queues. The question is whether the 'queue' array
> >> has RETA properties (order matters; elements can repeat) or not.
> >>
> >
> > Yes order matters and elements can repeat.
> >
> >>> If application wants to send traffic to some queue it should use the queue action.
> >>
> >> Yes, but that's not what I mean. Consider the following example. The user
> >> generates packets with random IP addresses at machine A. These packets
> >> hit DPDK at machine B. For a given *packet*, the sender (A) can
> >> compute its RSS hash in software. This will point out the RETA
> >> entry index. But, in order to predict the exact *queue* index,
> >> the sender has to know the table (its contents, its size).
> >>
> > Why do application need this info?
> >
> >> For a "global" DPDK RSS setting, the table can be easily obtained with
> >> an ethdev callback / API. Very simple. Fixed-size table, and it can
> >> be queried. But how does one obtain similar knowledge for RSS action?
> >>
> > The RSS action was designed to allow balanced traffic spread.
> > The size of the reta is PMD dependent, in some PMD the size will be
> > the number of queues in others it will be the number of queues but in
> > power of 2, so if the app requested 8 queues the reta will also be 8.
> > In any case PMD should use the given order, if the PMD needs to expend
> > it should cycle on the application requested queues in the order they were given.
> >
> >
> >>>
> >>>>> So, the question is whether the users should or should not build
> >>>>> any expectations of the effective table size and, if they should,
> >>>>> are they supposed to use the "global" table size for that?
> >>>>
> >>>> You are right this area is completely undocumented. Personally would really like
> >>>> it if rte_flow had a reference software implementation and all the HW vendors
> >>>> had to make sure their HW matched the SW reference version. But this a case
> >>>> where the funding is all on the HW side, and no one has time or resources
> >>>> to do a complete SW version..
> >>>>
> >>>> A sane implementation would configure RSS indirection as across all
> >>>> rx queues that were available when the device was started; ie all queues
> >>>> that did not have deferred start set. Then the application would start/stop
> >>>> queues and use rte_flow to reach them.
> >>>>
> >>>> But it doesn't appear the HW follows that model.
> >>>>
> >>>>
> >>>>>>>> When the user selects 'RTE_ETH_HASH_FUNCTION_DEFAULT' in action RSS, does
> >>>>>>>> that allow the PMD to configure an arbitrary, non-Toeplitz hash algorithm?
> 
> What do you think about the above question? In my opinion, DEFAULT should
> let the PMD select whatever hash function / algorithm it may want to
> select. Just some vendor-specific optimal choice.
> 
> If the user wants exactly Toeplitz / "standard RSS hash" behaviour,
> they can always specify enum TOEPLITZ. And the PMD must either
> comply or reject. What do you think? Are we on the same page?
> 

Fully agree with you.
The same goes if the user doesn't supply the key, PMD should select some default value.

> >>>>>>
> >>>>>> No the default is always Toeplitz.  This goes back to the original definition
> >>>>>> of RSS which is in Microsoft NDIS and uses Toeplitz.
> >>>>>
> >>>>> Then why have a dedicated enum named TOEPLITZ? Also, once again, the
> >>>>> documentation should be more specific to say which algorithm exactly
> >>>>> this DEFAULT choice provides. Otherwise, it is very vague.
> >>>>>
> >>>>>>
> >>>>>> DPDK should have more examples of using rte_flow, I have some samples
> >>>>>> but they aren't that useful.
> >>>>>>
> >>>>>
> >>>>> I could not agree more.
> >>>
> >>> Feel free to add/suggest what example are missing.
> >>>
> >>>>>
> >>>>> Thanks,
> >>>>> Ivan M.
> >>>
> >>> Best,
> >>> Ori
> >>>
> > Best,
> > Ori
> >
> 
> Best regards,
> Ivan M.

      reply	other threads:[~2022-01-10 17:18 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-29 14:34 Ivan Malov
2022-01-04 12:41 ` Thomas Monjalon
2022-01-04 16:54   ` Stephen Hemminger
2022-01-04 18:29     ` Ivan Malov
2022-01-04 21:56       ` Stephen Hemminger
2022-01-09 12:23         ` Ori Kam
2022-01-09 13:03           ` Ivan Malov
2022-01-10  9:54             ` Ori Kam
2022-01-10 15:04               ` Ivan Malov
2022-01-10 17:18                 ` Ori Kam [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MW2PR12MB4666ACD30EFF3C1A52947B99D6509@MW2PR12MB4666.namprd12.prod.outlook.com \
    --to=orika@nvidia.com \
    --cc=adrien.mazarguil@6wind.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=ivan.malov@oktetlabs.ru \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).