From: Alexander Duyck <alexander.duyck@gmail.com>
To: "Michael S. Tsirkin" <mst@redhat.com>,
Bruce Richardson <bruce.richardson@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>, Avi Kivity <avi@scylladb.com>
Subject: Re: [dpdk-dev] Having troubles binding an SR-IOV VF to uio_pci_generic on Amazon instance
Date: Thu, 1 Oct 2015 14:02:24 -0700 [thread overview]
Message-ID: <560D9F60.6040907@gmail.com> (raw)
In-Reply-To: <20151001155943-mutt-send-email-mst@redhat.com>
On 10/01/2015 06:14 AM, Michael S. Tsirkin wrote:
> On Thu, Oct 01, 2015 at 01:07:13PM +0100, Bruce Richardson wrote:
>>>> This in itself is going to use up
>>>> a good proportion of the processing time, as well as that we have to spend cycles
>>>> copying the descriptors from one ring in memory to another. Given that right now
>>>> with the vector ixgbe driver, the cycle cost per packet of RX is just a few dozen
>>>> cycles on modern cores, every additional cycle (fraction of a nanosecond) has
>>>> an impact.
>>>>
>>>> Regards,
>>>> /Bruce
>>> See above. There is no need for that on data path. Only re-adding
>>> buffers requires a system call.
>>>
>> Re-adding buffers is a key part of the data path! Ok, the fact that its only on
>> descriptor rearm does allow somewhat bigger batches,
> That was the point, yes.
>
>> but the whole point of having
>> the kernel do this extra work you propose is to allow the kernel to scan and
>> sanitize the physical addresses - and that will take a lot of cycles, especially
>> if it has to handle all the different descriptor formats of all the different NICs,
>> as has already been pointed out.
>>
>> /Bruce
> Well the driver would be per NIC, so there's only need to support
> specific formats supported by a given NIC.
One thing that seems to be overlooked in your discussion is the cost to
translate these descriptors. It isn't as if most systems running DPDK
have the cycles to spare. As I believe was brought up in another thread
we are looking at a budget of something like 68ns of 10Gbps line rate.
The overhead for having to go through and translate/parse/validate the
descriptors would end up being pretty significant. If you need proof of
that just try running the ixgbe driver and route small packets. We end
up spending something like 40ns in ixgbe_clean_rx_irq and that is mostly
just translating the descriptor bits into the correct sk_buff bits.
Also trying to maintain a user-space ring in addition to the
kernel-space ring means that much more memory overhead and increasing
the liklihood of things getting pushed out of the L1 cache.
As far as the descriptor validation itself the overhead for that would
guarantee that you cannot get any performance out of the device. There
are too many corner cases that would have to be addressed in validating
user-space input to allow for us to process packets in any sort of
timely fashion. For starters we would have to validate the size,
alignment, and ownership of a given buffer. If it is a transmit buffer
we have to go through and validate any offloads being requested. Likely
just the validation and translation would add 10s if not 100s of
nanoseconds to the time needed to process each packet. In addition we
are talking about doing this in kernel space which means we wouldn't
really be able to take advantage of things like SSE or AVX instructions.
> An alternative is to format the descriptors in kernel, based
> on just the list of addresses. This seems cleaner, but I don't
> know how efficient it would be.
>
> Device vendors and dpdk developers are probably the best people to
> figure out what's the best thing to do here.
As far as the bifurcated driver approach the only way something like
that would ever work is if you could limit the access via an IOMMU. At
least everything I have seen proposed for a bifurcated driver still
involved one if they expected to get any performance.
> But it looks like it's not going to happen unless security is made
> a requirement for upstreaming code.
The fact is we already ship uio_pci_generic. User space drivers are
here to stay. What is being asked for is an extension to the existing
infrastructure to allow MSI-X interrupts to trigger an event on a file
descriptor. As far as I know that doesn't add any additional security
risk since it is the kernel PCIe subsystem itself that would be
programming the address and data for said device, it wouldn't actually
grant any more access other then the additional file descriptors to
support MSI-X vectors.
Anyway that is just my $.02 on this.
- Alex
next prev parent reply other threads:[~2015-10-01 21:02 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-27 7:05 Vlad Zolotarov
2015-09-27 9:43 ` Michael S. Tsirkin
2015-09-27 10:50 ` Vladislav Zolotarov
2015-09-29 16:41 ` Vlad Zolotarov
2015-09-29 20:54 ` Michael S. Tsirkin
2015-09-29 21:46 ` Stephen Hemminger
2015-09-29 21:49 ` Michael S. Tsirkin
2015-09-30 10:37 ` Vlad Zolotarov
2015-09-30 10:58 ` Michael S. Tsirkin
2015-09-30 11:26 ` Vlad Zolotarov
[not found] ` <20150930143927-mutt-send-email-mst@redhat.com>
2015-09-30 11:53 ` Vlad Zolotarov
2015-09-30 12:03 ` Michael S. Tsirkin
2015-09-30 12:16 ` Vlad Zolotarov
2015-09-30 12:27 ` Michael S. Tsirkin
2015-09-30 12:50 ` Vlad Zolotarov
2015-09-30 15:26 ` Michael S. Tsirkin
2015-09-30 18:15 ` Vlad Zolotarov
2015-09-30 18:55 ` Michael S. Tsirkin
2015-09-30 19:06 ` Vlad Zolotarov
2015-09-30 19:10 ` Vlad Zolotarov
2015-09-30 19:11 ` Vlad Zolotarov
2015-09-30 19:39 ` Michael S. Tsirkin
2015-09-30 20:09 ` Vlad Zolotarov
2015-09-30 21:36 ` Stephen Hemminger
2015-09-30 21:53 ` Michael S. Tsirkin
2015-09-30 22:20 ` Vlad Zolotarov
2015-10-01 8:00 ` Vlad Zolotarov
2015-10-01 14:47 ` Stephen Hemminger
2015-10-01 15:03 ` Vlad Zolotarov
2015-09-30 13:05 ` Avi Kivity
2015-09-30 14:39 ` Michael S. Tsirkin
2015-09-30 14:53 ` Avi Kivity
2015-09-30 15:21 ` Michael S. Tsirkin
2015-09-30 15:36 ` Avi Kivity
2015-09-30 20:40 ` Michael S. Tsirkin
2015-09-30 21:00 ` Avi Kivity
2015-10-01 8:44 ` Michael S. Tsirkin
2015-10-01 8:46 ` Vlad Zolotarov
2015-10-01 8:52 ` Avi Kivity
2015-10-01 9:15 ` Michael S. Tsirkin
2015-10-01 9:22 ` Avi Kivity
2015-10-01 9:42 ` Michael S. Tsirkin
2015-10-01 9:53 ` Avi Kivity
2015-10-01 10:17 ` Michael S. Tsirkin
2015-10-01 10:24 ` Avi Kivity
2015-10-01 10:25 ` Avi Kivity
2015-10-01 10:44 ` Michael S. Tsirkin
2015-10-01 10:55 ` Avi Kivity
2015-10-01 21:17 ` Alexander Duyck
2015-10-02 13:50 ` Michael S. Tsirkin
2015-10-01 9:42 ` Vincent JARDIN
2015-10-01 9:43 ` Avi Kivity
2015-10-01 9:48 ` Vincent JARDIN
2015-10-01 9:54 ` Avi Kivity
2015-10-01 10:14 ` Michael S. Tsirkin
2015-10-01 10:23 ` Avi Kivity
2015-10-01 14:55 ` Stephen Hemminger
2015-10-01 15:49 ` Michael S. Tsirkin
2015-10-01 14:54 ` Stephen Hemminger
2015-10-01 9:55 ` Michael S. Tsirkin
2015-10-01 9:59 ` Avi Kivity
2015-10-01 10:38 ` Michael S. Tsirkin
2015-10-01 10:50 ` Avi Kivity
2015-10-01 11:09 ` Michael S. Tsirkin
2015-10-01 11:20 ` Avi Kivity
2015-10-01 11:27 ` Michael S. Tsirkin
2015-10-01 11:32 ` Avi Kivity
2015-10-01 15:01 ` Stephen Hemminger
2015-10-01 15:08 ` Avi Kivity
2015-10-01 15:46 ` Michael S. Tsirkin
2015-10-01 15:11 ` Michael S. Tsirkin
2015-10-01 15:19 ` Avi Kivity
2015-10-01 15:40 ` Michael S. Tsirkin
2015-10-01 11:31 ` Michael S. Tsirkin
2015-10-01 11:34 ` Avi Kivity
2015-10-01 11:08 ` Bruce Richardson
2015-10-01 11:23 ` Michael S. Tsirkin
2015-10-01 12:07 ` Bruce Richardson
2015-10-01 13:14 ` Michael S. Tsirkin
2015-10-01 16:04 ` Michael S. Tsirkin
2015-10-01 21:02 ` Alexander Duyck [this message]
2015-10-02 14:00 ` Michael S. Tsirkin
2015-10-02 14:07 ` Bruce Richardson
2015-10-04 9:07 ` Michael S. Tsirkin
2015-10-02 15:56 ` Gleb Natapov
2015-10-02 16:57 ` Alexander Duyck
2015-10-01 9:15 ` Avi Kivity
2015-10-01 9:29 ` Michael S. Tsirkin
2015-10-01 9:38 ` Avi Kivity
2015-10-01 10:07 ` Michael S. Tsirkin
2015-10-01 10:11 ` Avi Kivity
2015-10-01 9:16 ` Michael S. Tsirkin
2015-09-30 17:28 ` Stephen Hemminger
2015-09-30 17:39 ` Michael S. Tsirkin
2015-09-30 17:43 ` Stephen Hemminger
2015-09-30 18:50 ` Michael S. Tsirkin
2015-09-30 20:00 ` Gleb Natapov
2015-09-30 20:36 ` Michael S. Tsirkin
2015-10-01 5:04 ` Gleb Natapov
2015-09-30 17:44 ` Gleb Natapov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=560D9F60.6040907@gmail.com \
--to=alexander.duyck@gmail.com \
--cc=avi@scylladb.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=mst@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).