DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Avi Kivity <avi@scylladb.com>
Subject: Re: [dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
Date: Thu, 1 Oct 2015 13:07:13 +0100	[thread overview]
Message-ID: <20151001120713.GA11504@bricha3-MOBL3> (raw)
In-Reply-To: <20151001141124-mutt-send-email-mst@redhat.com>

On Thu, Oct 01, 2015 at 02:23:17PM +0300, Michael S. Tsirkin wrote:
> On Thu, Oct 01, 2015 at 12:08:07PM +0100, Bruce Richardson wrote:
> > On Thu, Oct 01, 2015 at 01:38:37PM +0300, Michael S. Tsirkin wrote:
> > > On Thu, Oct 01, 2015 at 12:59:47PM +0300, Avi Kivity wrote:
> > > > 
> > > > 
> > > > On 10/01/2015 12:55 PM, Michael S. Tsirkin wrote:
> > > > >On Thu, Oct 01, 2015 at 12:22:46PM +0300, Avi Kivity wrote:
> > > > >>It's easy to claim that
> > > > >>a solution is around the corner, only no one was looking for it, but the
> > > > >>reality is that kernel bypass has been a solution for years for high
> > > > >>performance users,
> > > > >I never said that it's trivial.
> > > > >
> > > > >It's probably a lot of work. It's definitely more work than just abusing
> > > > >sysfs.
> > > > >
> > > > >But it looks like a write system call into an eventfd is about 1.5
> > > > >microseconds on my laptop. Even with a system call per packet, system
> > > > >call overhead is not what makes DPDK drivers outperform Linux ones.
> > > > >
> > > > 
> > > > 1.5 us = 0.6 Mpps per core limit.
> > > 
> > > Oh, I calculated it incorrectly. It's 0.15 us. So 6Mpps.
> > > But for RX, you can batch a lot of packets.
> > > 
> > > You can see by now I'm not that good at benchmarking.
> > > Here's what I wrote:
> > > 
> > > 
> > > #include <stdbool.h>
> > > #include <sys/eventfd.h>
> > > #include <inttypes.h>
> > > #include <unistd.h>
> > > 
> > > 
> > > int main(int argc, char **argv)
> > > {
> > >         int e = eventfd(0, 0);
> > >         uint64_t v = 1;
> > > 
> > >         int i;
> > > 
> > >         for (i = 0; i < 10000000; ++i) {
> > >                 write(e, &v, sizeof v);
> > >         }
> > > }
> > > 
> > > 
> > > This takes 1.5 seconds to run on my laptop:
> > > 
> > > $ time ./a.out 
> > > 
> > > real    0m1.507s
> > > user    0m0.179s
> > > sys     0m1.328s
> > > 
> > > 
> > > > dpdk performance is in the tens of
> > > > millions of packets per system.
> > > 
> > > I think that's with a bunch of batching though.
> > > 
> > > > It's not just the lack of system calls, of course, the architecture is
> > > > completely different.
> > > 
> > > Absolutely - I'm not saying move all of DPDK into kernel.
> > > We just need to protect the RX rings so hardware does
> > > not corrupt kernel memory.
> > > 
> > > 
> > > Thinking about it some more, many devices
> > > have separate rings for DMA: TX (device reads memory)
> > > and RX (device writes memory).
> > > With such devices, a mode where userspace can write TX ring
> > > but not RX ring might make sense.
> > > 
> > > This will mean userspace might read kernel memory
> > > through the device, but can not corrupt it.
> > > 
> > > That's already a big win!
> > > 
> > > And RX buffers do not have to be added one at a time.
> > > If we assume 0.2usec per system call, batching some 100 buffers per
> > > system call gives you 2 nano seconds overhead.  That seems quite
> > > reasonable.
> > > 
> > Hi,
> > 
> > just to jump in a bit on this.
> > 
> > Batching of 100 packets is a very large batch, and will add to latency.
> 
> 
> 
> This is not on transmit or receive path!
> This is only for re-adding buffers to the receive ring.
> This batching should not add latency at all:
> 
> 
> process rx:
> 	get packet
> 	packets[n] = alloc packet
> 	if (++n > 100) {
> 		system call: add bufs(packets, n);
> 	}
> 
> 
> 
> 
> 
> > The
> > standard batch size in DPDK right now is 32, and even that may be too high for
> > applications in certain domains.
> > 
> > However, even with that 2ns of overhead calculation, I'd make a few additional
> > points.
> > * For DPDK, we are reasonably close to being able to do 40GB of IO - both RX 
> > and TX on a single thread. 10GB of IO doesn't really stress a core any more. For
> > 40GB of small packet traffic, the packet arrival rate is 16.8ns, so even with a
> > huge batch size of 100 packets, your system call overhead on RX is taking almost
> > 12% of our processing time. For a batch size of 32 this overhead would rise to
> > over 35% of our packet processing time.
> 
> As I said, yes, measureable, but not breaking the bank, and that's with
> 40GB which still are not widespread.
> With 10GB and 100 packets, only 3% overhead.
> 
> > For 100G line rate, the packet arrival
> > rate is just 6.7ns...
> 
> Hypervisors still have time get their act together and support IOMMUs
> by the time 100G systems become widespread.
> 
> > * As well as this overhead from the system call itself, you are also omitting
> > the overhead of scanning the RX descriptors.
> 
> I omit it because scanning descriptors can still be done in userspace,
> just write-protect the RX ring page.
> 
> 
> > This in itself is going to use up
> > a good proportion of the processing time, as well as that we have to spend cycles
> > copying the descriptors from one ring in memory to another. Given that right now
> > with the vector ixgbe driver, the cycle cost per packet of RX is just a few dozen
> > cycles on modern cores, every additional cycle (fraction of a nanosecond) has
> > an impact.
> > 
> > Regards,
> > /Bruce
> 
> See above.  There is no need for that on data path. Only re-adding
> buffers requires a system call.
> 

Re-adding buffers is a key part of the data path! Ok, the fact that its only on
descriptor rearm does allow somewhat bigger batches, but the whole point of having
the kernel do this extra work you propose is to allow the kernel to scan and
sanitize the physical addresses - and that will take a lot of cycles, especially
if it has to handle all the different descriptor formats of all the different NICs,
as has already been pointed out.

/Bruce

> -- 
> MST

  reply	other threads:[~2015-10-01 12:07 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-27  7:05 Vlad Zolotarov
2015-09-27  9:43 ` Michael S. Tsirkin
2015-09-27 10:50   ` Vladislav Zolotarov
2015-09-29 16:41   ` Vlad Zolotarov
2015-09-29 20:54     ` Michael S. Tsirkin
2015-09-29 21:46       ` Stephen Hemminger
2015-09-29 21:49         ` Michael S. Tsirkin
2015-09-30 10:37           ` Vlad Zolotarov
2015-09-30 10:58             ` Michael S. Tsirkin
2015-09-30 11:26               ` Vlad Zolotarov
     [not found]                 ` <20150930143927-mutt-send-email-mst@redhat.com>
2015-09-30 11:53                   ` Vlad Zolotarov
2015-09-30 12:03                     ` Michael S. Tsirkin
2015-09-30 12:16                       ` Vlad Zolotarov
2015-09-30 12:27                         ` Michael S. Tsirkin
2015-09-30 12:50                           ` Vlad Zolotarov
2015-09-30 15:26                             ` Michael S. Tsirkin
2015-09-30 18:15                               ` Vlad Zolotarov
2015-09-30 18:55                                 ` Michael S. Tsirkin
2015-09-30 19:06                                   ` Vlad Zolotarov
2015-09-30 19:10                                     ` Vlad Zolotarov
2015-09-30 19:11                                       ` Vlad Zolotarov
2015-09-30 19:39                                     ` Michael S. Tsirkin
2015-09-30 20:09                                       ` Vlad Zolotarov
2015-09-30 21:36                                         ` Stephen Hemminger
2015-09-30 21:53                                           ` Michael S. Tsirkin
2015-09-30 22:20                                           ` Vlad Zolotarov
2015-10-01  8:00                                           ` Vlad Zolotarov
2015-10-01 14:47                                             ` Stephen Hemminger
2015-10-01 15:03                                               ` Vlad Zolotarov
2015-09-30 13:05                           ` Avi Kivity
2015-09-30 14:39                             ` Michael S. Tsirkin
2015-09-30 14:53                               ` Avi Kivity
2015-09-30 15:21                                 ` Michael S. Tsirkin
2015-09-30 15:36                                   ` Avi Kivity
2015-09-30 20:40                                     ` Michael S. Tsirkin
2015-09-30 21:00                                       ` Avi Kivity
2015-10-01  8:44                                       ` Michael S. Tsirkin
2015-10-01  8:46                                         ` Vlad Zolotarov
2015-10-01  8:52                                         ` Avi Kivity
2015-10-01  9:15                                           ` Michael S. Tsirkin
2015-10-01  9:22                                             ` Avi Kivity
2015-10-01  9:42                                               ` Michael S. Tsirkin
2015-10-01  9:53                                                 ` Avi Kivity
2015-10-01 10:17                                                   ` Michael S. Tsirkin
2015-10-01 10:24                                                     ` Avi Kivity
2015-10-01 10:25                                                       ` Avi Kivity
2015-10-01 10:44                                                         ` Michael S. Tsirkin
2015-10-01 10:55                                                           ` Avi Kivity
2015-10-01 21:17                                                 ` Alexander Duyck
2015-10-02 13:50                                                   ` Michael S. Tsirkin
2015-10-01  9:42                                               ` Vincent JARDIN
2015-10-01  9:43                                                 ` Avi Kivity
2015-10-01  9:48                                                   ` Vincent JARDIN
2015-10-01  9:54                                                     ` Avi Kivity
2015-10-01 10:14                                                   ` Michael S. Tsirkin
2015-10-01 10:23                                                     ` Avi Kivity
2015-10-01 14:55                                                     ` Stephen Hemminger
2015-10-01 15:49                                                       ` Michael S. Tsirkin
2015-10-01 14:54                                                 ` Stephen Hemminger
2015-10-01  9:55                                               ` Michael S. Tsirkin
2015-10-01  9:59                                                 ` Avi Kivity
2015-10-01 10:38                                                   ` Michael S. Tsirkin
2015-10-01 10:50                                                     ` Avi Kivity
2015-10-01 11:09                                                       ` Michael S. Tsirkin
2015-10-01 11:20                                                         ` Avi Kivity
2015-10-01 11:27                                                           ` Michael S. Tsirkin
2015-10-01 11:32                                                             ` Avi Kivity
2015-10-01 15:01                                                               ` Stephen Hemminger
2015-10-01 15:08                                                                 ` Avi Kivity
2015-10-01 15:46                                                                 ` Michael S. Tsirkin
2015-10-01 15:11                                                               ` Michael S. Tsirkin
2015-10-01 15:19                                                                 ` Avi Kivity
2015-10-01 15:40                                                                   ` Michael S. Tsirkin
2015-10-01 11:31                                                           ` Michael S. Tsirkin
2015-10-01 11:34                                                             ` Avi Kivity
2015-10-01 11:08                                                     ` Bruce Richardson
2015-10-01 11:23                                                       ` Michael S. Tsirkin
2015-10-01 12:07                                                         ` Bruce Richardson [this message]
2015-10-01 13:14                                                           ` Michael S. Tsirkin
2015-10-01 16:04                                                             ` Michael S. Tsirkin
2015-10-01 21:02                                                             ` Alexander Duyck
2015-10-02 14:00                                                               ` Michael S. Tsirkin
2015-10-02 14:07                                                                 ` Bruce Richardson
2015-10-04  9:07                                                                   ` Michael S. Tsirkin
2015-10-02 15:56                                                                 ` Gleb Natapov
2015-10-02 16:57                                                                 ` Alexander Duyck
2015-10-01  9:15                                           ` Avi Kivity
2015-10-01  9:29                                             ` Michael S. Tsirkin
2015-10-01  9:38                                               ` Avi Kivity
2015-10-01 10:07                                                 ` Michael S. Tsirkin
2015-10-01 10:11                                                   ` Avi Kivity
2015-10-01  9:16                                         ` Michael S. Tsirkin
2015-09-30 17:28             ` Stephen Hemminger
2015-09-30 17:39               ` Michael S. Tsirkin
2015-09-30 17:43                 ` Stephen Hemminger
2015-09-30 18:50                   ` Michael S. Tsirkin
2015-09-30 20:00                     ` Gleb Natapov
2015-09-30 20:36                       ` Michael S. Tsirkin
2015-10-01  5:04                         ` Gleb Natapov
2015-09-30 17:44                 ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151001120713.GA11504@bricha3-MOBL3 \
    --to=bruce.richardson@intel.com \
    --cc=avi@scylladb.com \
    --cc=dev@dpdk.org \
    --cc=mst@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).