From: Avi Kivity <avi@scylladb.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
Date: Thu, 1 Oct 2015 13:50:10 +0300 [thread overview]
Message-ID: <560D0FE2.7010905@scylladb.com> (raw)
In-Reply-To: <20151001131754-mutt-send-email-mst@redhat.com>
On 10/01/2015 01:38 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 01, 2015 at 12:59:47PM +0300, Avi Kivity wrote:
>>
>> On 10/01/2015 12:55 PM, Michael S. Tsirkin wrote:
>>> On Thu, Oct 01, 2015 at 12:22:46PM +0300, Avi Kivity wrote:
>>>> It's easy to claim that
>>>> a solution is around the corner, only no one was looking for it, but the
>>>> reality is that kernel bypass has been a solution for years for high
>>>> performance users,
>>> I never said that it's trivial.
>>>
>>> It's probably a lot of work. It's definitely more work than just abusing
>>> sysfs.
>>>
>>> But it looks like a write system call into an eventfd is about 1.5
>>> microseconds on my laptop. Even with a system call per packet, system
>>> call overhead is not what makes DPDK drivers outperform Linux ones.
>>>
>> 1.5 us = 0.6 Mpps per core limit.
> Oh, I calculated it incorrectly. It's 0.15 us. So 6Mpps.
You also trimmed the extra work that needs to be done, that I
mentioned. Maybe your ring proxy can work, maybe it can't. In any case
it's a hefty chunk of work. Should this work block users from using
their VFs, if they happen to need interrupt support?
> But for RX, you can batch a lot of packets.
>
> You can see by now I'm not that good at benchmarking.
> Here's what I wrote:
>
>
> #include <stdbool.h>
> #include <sys/eventfd.h>
> #include <inttypes.h>
> #include <unistd.h>
>
>
> int main(int argc, char **argv)
> {
> int e = eventfd(0, 0);
> uint64_t v = 1;
>
> int i;
>
> for (i = 0; i < 10000000; ++i) {
> write(e, &v, sizeof v);
> }
> }
>
>
> This takes 1.5 seconds to run on my laptop:
>
> $ time ./a.out
>
> real 0m1.507s
> user 0m0.179s
> sys 0m1.328s
>
>
>> dpdk performance is in the tens of
>> millions of packets per system.
> I think that's with a bunch of batching though.
Yes, it's also with their application code running as well. They didn't
reach this kind of performance by spending cycles unnecessarily.
I'm not saying that the ring proxy is not workable; just that we don't
know whether it is or not, and I don't think that a patch that enables
_existing functionality_ for VFs should be blocked in favor of a new and
unproven approach.
>
>> It's not just the lack of system calls, of course, the architecture is
>> completely different.
> Absolutely - I'm not saying move all of DPDK into kernel.
> We just need to protect the RX rings so hardware does
> not corrupt kernel memory.
>
>
> Thinking about it some more, many devices
> have separate rings for DMA: TX (device reads memory)
> and RX (device writes memory).
> With such devices, a mode where userspace can write TX ring
> but not RX ring might make sense.
I'm sure you can cause havoc just by reading, if you read from I/O memory.
>
> This will mean userspace might read kernel memory
> through the device, but can not corrupt it.
>
> That's already a big win!
>
> And RX buffers do not have to be added one at a time.
> If we assume 0.2usec per system call, batching some 100 buffers per
> system call gives you 2 nano seconds overhead. That seems quite
> reasonable.
You're ignoring the page table walk and other per-descriptor processing.
Again^2, maybe this can work. But it shouldn't block a patch enabling
interrupt support of VFs. After the ring proxy is available and proven
for a few years, we can deprecate bus mastering from uio, and after a
few more years remove it.
next prev parent reply other threads:[~2015-10-01 10:50 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-27 7:05 Vlad Zolotarov
2015-09-27 9:43 ` Michael S. Tsirkin
2015-09-27 10:50 ` Vladislav Zolotarov
2015-09-29 16:41 ` Vlad Zolotarov
2015-09-29 20:54 ` Michael S. Tsirkin
2015-09-29 21:46 ` Stephen Hemminger
2015-09-29 21:49 ` Michael S. Tsirkin
2015-09-30 10:37 ` Vlad Zolotarov
2015-09-30 10:58 ` Michael S. Tsirkin
2015-09-30 11:26 ` Vlad Zolotarov
[not found] ` <20150930143927-mutt-send-email-mst@redhat.com>
2015-09-30 11:53 ` Vlad Zolotarov
2015-09-30 12:03 ` Michael S. Tsirkin
2015-09-30 12:16 ` Vlad Zolotarov
2015-09-30 12:27 ` Michael S. Tsirkin
2015-09-30 12:50 ` Vlad Zolotarov
2015-09-30 15:26 ` Michael S. Tsirkin
2015-09-30 18:15 ` Vlad Zolotarov
2015-09-30 18:55 ` Michael S. Tsirkin
2015-09-30 19:06 ` Vlad Zolotarov
2015-09-30 19:10 ` Vlad Zolotarov
2015-09-30 19:11 ` Vlad Zolotarov
2015-09-30 19:39 ` Michael S. Tsirkin
2015-09-30 20:09 ` Vlad Zolotarov
2015-09-30 21:36 ` Stephen Hemminger
2015-09-30 21:53 ` Michael S. Tsirkin
2015-09-30 22:20 ` Vlad Zolotarov
2015-10-01 8:00 ` Vlad Zolotarov
2015-10-01 14:47 ` Stephen Hemminger
2015-10-01 15:03 ` Vlad Zolotarov
2015-09-30 13:05 ` Avi Kivity
2015-09-30 14:39 ` Michael S. Tsirkin
2015-09-30 14:53 ` Avi Kivity
2015-09-30 15:21 ` Michael S. Tsirkin
2015-09-30 15:36 ` Avi Kivity
2015-09-30 20:40 ` Michael S. Tsirkin
2015-09-30 21:00 ` Avi Kivity
2015-10-01 8:44 ` Michael S. Tsirkin
2015-10-01 8:46 ` Vlad Zolotarov
2015-10-01 8:52 ` Avi Kivity
2015-10-01 9:15 ` Michael S. Tsirkin
2015-10-01 9:22 ` Avi Kivity
2015-10-01 9:42 ` Michael S. Tsirkin
2015-10-01 9:53 ` Avi Kivity
2015-10-01 10:17 ` Michael S. Tsirkin
2015-10-01 10:24 ` Avi Kivity
2015-10-01 10:25 ` Avi Kivity
2015-10-01 10:44 ` Michael S. Tsirkin
2015-10-01 10:55 ` Avi Kivity
2015-10-01 21:17 ` Alexander Duyck
2015-10-02 13:50 ` Michael S. Tsirkin
2015-10-01 9:42 ` Vincent JARDIN
2015-10-01 9:43 ` Avi Kivity
2015-10-01 9:48 ` Vincent JARDIN
2015-10-01 9:54 ` Avi Kivity
2015-10-01 10:14 ` Michael S. Tsirkin
2015-10-01 10:23 ` Avi Kivity
2015-10-01 14:55 ` Stephen Hemminger
2015-10-01 15:49 ` Michael S. Tsirkin
2015-10-01 14:54 ` Stephen Hemminger
2015-10-01 9:55 ` Michael S. Tsirkin
2015-10-01 9:59 ` Avi Kivity
2015-10-01 10:38 ` Michael S. Tsirkin
2015-10-01 10:50 ` Avi Kivity [this message]
2015-10-01 11:09 ` Michael S. Tsirkin
2015-10-01 11:20 ` Avi Kivity
2015-10-01 11:27 ` Michael S. Tsirkin
2015-10-01 11:32 ` Avi Kivity
2015-10-01 15:01 ` Stephen Hemminger
2015-10-01 15:08 ` Avi Kivity
2015-10-01 15:46 ` Michael S. Tsirkin
2015-10-01 15:11 ` Michael S. Tsirkin
2015-10-01 15:19 ` Avi Kivity
2015-10-01 15:40 ` Michael S. Tsirkin
2015-10-01 11:31 ` Michael S. Tsirkin
2015-10-01 11:34 ` Avi Kivity
2015-10-01 11:08 ` Bruce Richardson
2015-10-01 11:23 ` Michael S. Tsirkin
2015-10-01 12:07 ` Bruce Richardson
2015-10-01 13:14 ` Michael S. Tsirkin
2015-10-01 16:04 ` Michael S. Tsirkin
2015-10-01 21:02 ` Alexander Duyck
2015-10-02 14:00 ` Michael S. Tsirkin
2015-10-02 14:07 ` Bruce Richardson
2015-10-04 9:07 ` Michael S. Tsirkin
2015-10-02 15:56 ` Gleb Natapov
2015-10-02 16:57 ` Alexander Duyck
2015-10-01 9:15 ` Avi Kivity
2015-10-01 9:29 ` Michael S. Tsirkin
2015-10-01 9:38 ` Avi Kivity
2015-10-01 10:07 ` Michael S. Tsirkin
2015-10-01 10:11 ` Avi Kivity
2015-10-01 9:16 ` Michael S. Tsirkin
2015-09-30 17:28 ` Stephen Hemminger
2015-09-30 17:39 ` Michael S. Tsirkin
2015-09-30 17:43 ` Stephen Hemminger
2015-09-30 18:50 ` Michael S. Tsirkin
2015-09-30 20:00 ` Gleb Natapov
2015-09-30 20:36 ` Michael S. Tsirkin
2015-10-01 5:04 ` Gleb Natapov
2015-09-30 17:44 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=560D0FE2.7010905@scylladb.com \
--to=avi@scylladb.com \
--cc=dev@dpdk.org \
--cc=mst@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).