From: Bruce Richardson <bruce.richardson@intel.com>
To: Matthew Hall <mhall@mhcomputing.net>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH RFC] librte_reorder: new reorder library
Date: Fri, 10 Oct 2014 11:59:06 +0100 [thread overview]
Message-ID: <20141010105906.GA12696@BRICHA3-MOBL> (raw)
In-Reply-To: <20141009171135.GA8620@mhcomputing.net>
On Thu, Oct 09, 2014 at 10:11:35AM -0700, Matthew Hall wrote:
> On Thu, Oct 09, 2014 at 10:14:21AM +0100, Bruce Richardson wrote:
> > Hi Matthew,
> >
> > What you are doing will indeed work, and it's the way the vast majority of
> > the sample apps are written. However, this will not always work for everyone
> > else, sadly.
> >
> > First off, with RSS, there are a number of limitations. On the 1G and 10G
> > NICs RSS works only with IP traffic, and won't work in cases with other
> > protocols or where IP is encapsulated in anything other than a single VLAN.
> > Those cases need software load distribution. As well as this, you have very
> > little control over where flows get put, as the separation into queues
> > (which go to cores), is only done on the low seven bits. For applications
> > which work with a small number of flows, e.g. where multiple flows are
> > contained inside a single tunnel, you get a get a large flow imbalance,
> > where you get far more traffic coming to one queue/core than to another.
> > Again in this instance, software load balancing is needed.
> >
> > Secondly, then, based off that, it is entirely possible when doing software
> > load balancing to strictly process packets for a flow in order - and indeed
> > this is what the existing packet distributor does. However, for certain
> > types of flow where processing of packets for that flow can be done in
> > parallel, forcing things to be done serially can slow things down. As well
> > as this, there can sometimes be requirements for the load balancing between
> > cores to be done as fairly as possible so that it is guaranteed that all
> > cores have approx the same load, irrespective of the number of input flows.
> > In these cases, having the option to blindly distribute traffic to cores and
> > then reorder packets on TX is the best way to ensure even load distribution.
> > It's not going to be for everyone, but it's good to have the option - and
> > there are a number of people doing things this way already.
> >
> > Lastly, there is also the assumption being made that all flows are
> > independent, which again may not always be the case. If you need ordering
> > across flows and to share load between cores then reordering on transmission
> > is the only way to do things.
> >
> > Hope this helps,
> >
> > Regards,
> > /Bruce
>
> Bruce,
>
> This explanation is of excellent quality.
>
> It would be nice if it could be made into a whitepaper about the different
> L2-L7 acceleration technologies available in the Intel NICs, popular VNICs
> (virtio-net and vmxnet3), Intel CPUs, and DPDK code, all working together. Or
> incorporated into such a document if it already exists.
>
> Without things like this it's very hard to understand when and how to enable
> the different accelerations can be used together, when they'll work, and when
> they won't work.
>
> For example, I didn't know RSS only worked on IP... I was assuming it would do
> a consistent-hash of MAC's for non-IP packets at least... also, when it
> doesn't know what to do, does it send them to the default queue, or a random
> FIFO RX queue picks it up or what?
>
When RSS gets a non-IP packet, or a packet it can't hash, that packet will
be put into queue 0. This leads to a number of little tricks we can use if
we have a mix of IP/non-IP traffic.
1. To simply separate out IP traffic from non-IP traffic, we just turn on
RSS and update the reta table to have all entries set to queue 1. This means
that all IP traffic goes to queue 1, and all other traffic to queue 0.
2. If you want to separate IPv6 from IPv4, you can do the exact same thing
as in point 1, except only turn on RSS for one of the protocols. If you only
turn on RSS for IPv4, then IPv6 traffic should be treated as non-IP and go
to queue 0.
3. If you have IP and non-IP traffic going to a set of ports and are using
multiple RSS queues to split that traffic across multiple cores, such that
each core also reads from each port [e.g. 4 ports, and 4 cores, where each
core reads one RSS queue on each port], you can "rotate" the RSS table
between ports so that you also load-balance the non-IP traffic coming in.
Taking the referenced example, instead of having core 0 read queue 0 on each
port, you have the values that hash to queue 0 on port 0 get directed to
queue 1 on port 1, queue 2 on port 2, etc. Then core 0 [and every other
core] reads a different queue number on each port - while still getting the
same flows. Furthermore, since the non-IP traffic is unaffected and always
goes to queue 0, the non-IP traffic to each port gets handled by a different
core, rather than all non-IP traffic going to core 0 as would be the case in
the default setup. [Yes, this would be the case too if you took the simple
option of just having one core per port, but doing things this way also
gives you load balancing if one port is busier than the others.]
Finally, I'd just note that RSS is documented in section 7.1.2.8 of the
datasheet for the Intel 82599 10 GbE Controller, and to read up there for
any more information.
Regards,
/Bruce
next prev parent reply other threads:[~2014-10-10 10:51 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-07 9:33 Pattan, Reshma
2014-10-07 11:21 ` Neil Horman
2014-10-08 14:11 ` Pattan, Reshma
2014-10-08 19:15 ` Neil Horman
2014-10-09 10:27 ` Pattan, Reshma
2014-10-09 11:36 ` Neil Horman
2014-10-09 14:36 ` Pattan, Reshma
2014-10-09 16:09 ` Neil Horman
2014-10-09 17:21 ` Matthew Hall
2014-10-09 17:55 ` Neil Horman
2014-10-08 22:41 ` Matthew Hall
2014-10-08 22:55 ` Neil Horman
2014-10-08 23:07 ` Matthew Hall
2014-10-09 9:14 ` Bruce Richardson
2014-10-09 17:11 ` Matthew Hall
2014-10-10 10:59 ` Bruce Richardson [this message]
2014-10-09 19:01 ` Jay Rolette
2014-10-17 9:44 ` Pattan, Reshma
2014-10-17 16:26 ` Jay Rolette
2014-10-18 17:26 ` Matthew Hall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141010105906.GA12696@BRICHA3-MOBL \
--to=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=mhall@mhcomputing.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).