DPDK patches and discussions
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Stephen Hemminger <stephen@networkplumber.org>
Cc: Andy Lutomirski <luto@kernel.org>,
	dev@dpdk.org, Thomas Gleixner <tglx@linutronix.de>,
	 Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [dpdk-dev] Please stop using iopl() in DPDK
Date: Fri, 25 Oct 2019 17:27:22 -0700	[thread overview]
Message-ID: <CALCETrW1HT_sSbjEodukboy9GxXosVu1uopWX8CBdFh8S7JiSQ@mail.gmail.com> (raw)
In-Reply-To: <20191025091310.05770edc@hermes.lan>

> On Oct 25, 2019, at 9:13 AM, Stephen Hemminger <stephen@networkplumber.org> wrote:
>
> On Thu, 24 Oct 2019 21:45:56 -0700
> Andy Lutomirski <luto@kernel.org> wrote:
>
>> Hi all-
>>
>> Supporting iopl() in the Linux kernel is becoming a maintainability
>> problem.  As far as I know, DPDK is the only major modern user of
>> iopl().
>>
>> After doing some research, DPDK uses direct io port access for only a
>> single purpose: accessing legacy virtio configuration structures.
>> These structures are mapped in IO space in BAR 0 on legacy virtio
>> devices.
>
> Yes. Legacy virtio seems to have been designed without consideration
> of how to use it in userspace. Xen, Vmware and Hyper-V all use memory
> as a doorbell mechanism which is easier to use from userspace.
>
>
>> There are at least three ways you could avoid using iopl().  Here they
>> are in rough order of quality in my opinion:
>>
>> 1. Change pci_uio_ioport_read() and pci_uio_ioport_write() to use
>> read() and write() on resource0 in sysfs.
>
> The cost of entering the kernel for a doorbell mechanism is too
> expensive and would kill performance.
>
>
>> 2. Use the alternative access mechanism in the virtio legacy spec:
>> there is a way to access all of these structures via configuration
>> space.
>
> There is no way to use memory doorbell on older versions of virtio.
> Users want to run DPDK on old stuff like RHEL6 and even older
> kernel forks. There are even use cases where virtio is used for
> a non-Linux host; such as GCP.
>
>
>> 3. Use ioperm() instead of iopl().
>
> Ioperm has the wrong thread semantics. All DPDK applications have
> multiple threads and the initialization logic needs to work even
> if the thread is started later; threads can also be started by
> the user application.
>
> Iopl applies to whole process so this is not an issue.

This is not true. ioperm() and iopl() have identical thread semantics.
I think what you’re seeing is that you can set iopl(3) early without
knowing which port range to request. You could alternatively set
ioperm() early and ask for a very wide range.  In principle, we could
make ioperm() be per thread, but I’m not sure we should add that kind
of complexity to support a mostly obsolete use case like this.

There's actually an argument to be made that per-mm ioperm would be
easier to handle in the kernel than per-task due to the vagaries of
KPTI.

All this being said, what are the actual performance implications of
write() to /sys/.../resource0?  Off the top of my head, I would guess
that the actual OUTB or OUTL instruction itself is incredibly slow due
to being trapped and emulated and that virtio-legacy hypervisors
aren't particularly fast to begin with and that, as a result, the
write() might not actually matter that much.

>
>>
>>
>> We are considering changes to the kernel that will potentially harm
>> the performance of any program that uses iopl(3) -- in particular,
>> context switches will become more expensive, and the scheduler might
>> need to explicitly penalize such programs to ensure fairness.  Using
>> ioperm() already hurts performance, and the proposed changes to iopl()
>> will make it even worse.  Alternatively, the kernel could drop iopl()
>> support entirely.  I will certainly make a change to allow
>> distributions to remove iopl() support entirely from their kernels,
>> and I expect that distributions will do this.
>>
>> Please fix DPDK.
>
> Please fix virtio.

Done, with the new version of virtio :)

      parent reply	other threads:[~2019-10-26  0:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-25  4:45 Andy Lutomirski
2019-10-25  6:42 ` Willy Tarreau
2019-10-25 14:45   ` Andy Lutomirski
2019-10-25 15:03     ` Willy Tarreau
2019-10-27 23:44     ` Maciej W. Rozycki
2019-10-28 16:42   ` Stephen Hemminger
2019-10-28 18:00     ` Andy Lutomirski
2019-10-28 20:13     ` Willy Tarreau
2019-10-25  7:22 ` David Marchand
2019-10-25 16:13 ` Stephen Hemminger
2019-10-25 20:43   ` Thomas Gleixner
2019-10-26  0:27   ` Andy Lutomirski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrW1HT_sSbjEodukboy9GxXosVu1uopWX8CBdFh8S7JiSQ@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=dev@dpdk.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=stephen@networkplumber.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).