* [dpdk-dev] Please stop using iopl() in DPDK @ 2019-10-25 4:45 Andy Lutomirski 2019-10-25 6:42 ` Willy Tarreau ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Andy Lutomirski @ 2019-10-25 4:45 UTC (permalink / raw) To: dev, Thomas Gleixner, Peter Zijlstra, LKML Hi all- Supporting iopl() in the Linux kernel is becoming a maintainability problem. As far as I know, DPDK is the only major modern user of iopl(). After doing some research, DPDK uses direct io port access for only a single purpose: accessing legacy virtio configuration structures. These structures are mapped in IO space in BAR 0 on legacy virtio devices. There are at least three ways you could avoid using iopl(). Here they are in rough order of quality in my opinion: 1. Change pci_uio_ioport_read() and pci_uio_ioport_write() to use read() and write() on resource0 in sysfs. 2. Use the alternative access mechanism in the virtio legacy spec: there is a way to access all of these structures via configuration space. 3. Use ioperm() instead of iopl(). We are considering changes to the kernel that will potentially harm the performance of any program that uses iopl(3) -- in particular, context switches will become more expensive, and the scheduler might need to explicitly penalize such programs to ensure fairness. Using ioperm() already hurts performance, and the proposed changes to iopl() will make it even worse. Alternatively, the kernel could drop iopl() support entirely. I will certainly make a change to allow distributions to remove iopl() support entirely from their kernels, and I expect that distributions will do this. Please fix DPDK. Thanks, Andy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-25 4:45 [dpdk-dev] Please stop using iopl() in DPDK Andy Lutomirski @ 2019-10-25 6:42 ` Willy Tarreau 2019-10-25 14:45 ` Andy Lutomirski 2019-10-28 16:42 ` Stephen Hemminger 2019-10-25 7:22 ` David Marchand 2019-10-25 16:13 ` Stephen Hemminger 2 siblings, 2 replies; 12+ messages in thread From: Willy Tarreau @ 2019-10-25 6:42 UTC (permalink / raw) To: Andy Lutomirski; +Cc: dev, Thomas Gleixner, Peter Zijlstra, LKML Hi Andy, On Thu, Oct 24, 2019 at 09:45:56PM -0700, Andy Lutomirski wrote: > Hi all- > > Supporting iopl() in the Linux kernel is becoming a maintainability > problem. As far as I know, DPDK is the only major modern user of > iopl(). > > After doing some research, DPDK uses direct io port access for only a > single purpose: accessing legacy virtio configuration structures. > These structures are mapped in IO space in BAR 0 on legacy virtio > devices. > > There are at least three ways you could avoid using iopl(). Here they > are in rough order of quality in my opinion: (...) I'm just wondering, why wouldn't we introduce a sys_ioport() syscall to perform I/Os in the kernel without having to play at all with iopl()/ ioperm() ? That would alleviate the need for these large port maps. Applications that use outb/inb() usually don't need extreme speeds. Each time I had to use them, it was to access a watchdog, a sensor, a fan, control a front panel LED, or read/write to NVRAM. Some userland drivers possibly don't need much more, and very likely run with privileges turned on all the time, so replacing their inb()/outb() calls would mostly be a matter of redefining them using a macro to use the syscall instead. I'd see an API more or less like this : int ioport(int op, u16 port, long val, long *ret); <op> would take values such as INB,INW,INL to fill *<ret>, OUTB,OUTW,OUL to read from <val>, possibly ORB,ORW,ORL to read, or with <val>, write back and return previous value to <ret>, ANDB/W/L, XORB/W/L to do the same with and/xor, and maybe a TEST operation to just validate support at start time and replace ioperm/iopl so that subsequent calls do not need to check for errors. Applications could then replace : ioperm() with ioport(TEST,port,0,0) iopl() with ioport(TEST,0,0,0) outb() with ioport(OUTB,port,val,0) inb() with ({ char val;ioport(INB,port,0,&val);val;}) ... and so on. And then ioperm/iopl can easily be dropped. Maybe I'm overlooking something ? Willy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-25 6:42 ` Willy Tarreau @ 2019-10-25 14:45 ` Andy Lutomirski 2019-10-25 15:03 ` Willy Tarreau 2019-10-27 23:44 ` Maciej W. Rozycki 2019-10-28 16:42 ` Stephen Hemminger 1 sibling, 2 replies; 12+ messages in thread From: Andy Lutomirski @ 2019-10-25 14:45 UTC (permalink / raw) To: Willy Tarreau; +Cc: Andy Lutomirski, dev, Thomas Gleixner, Peter Zijlstra, LKML On Thu, Oct 24, 2019 at 11:42 PM Willy Tarreau <w@1wt.eu> wrote: > > Hi Andy, > > On Thu, Oct 24, 2019 at 09:45:56PM -0700, Andy Lutomirski wrote: > > Hi all- > > > > Supporting iopl() in the Linux kernel is becoming a maintainability > > problem. As far as I know, DPDK is the only major modern user of > > iopl(). > > > > After doing some research, DPDK uses direct io port access for only a > > single purpose: accessing legacy virtio configuration structures. > > These structures are mapped in IO space in BAR 0 on legacy virtio > > devices. > > > > There are at least three ways you could avoid using iopl(). Here they > > are in rough order of quality in my opinion: > (...) > > I'm just wondering, why wouldn't we introduce a sys_ioport() syscall > to perform I/Os in the kernel without having to play at all with iopl()/ > ioperm() ? That would alleviate the need for these large port maps. > Applications that use outb/inb() usually don't need extreme speeds. > Each time I had to use them, it was to access a watchdog, a sensor, a > fan, control a front panel LED, or read/write to NVRAM. Some userland > drivers possibly don't need much more, and very likely run with > privileges turned on all the time, so replacing their inb()/outb() calls > would mostly be a matter of redefining them using a macro to use the > syscall instead. > > I'd see an API more or less like this : > > int ioport(int op, u16 port, long val, long *ret); Hmm. I have some memory of a /dev/ioport or similar, but now I can't find it. It does seem quite reasonable. But, for uses like DPDK, /sys/.../resource0 seems like a *far* better API, since it actually uses the kernel's concept of which io range corresponds to which device instead of hoping that the mappings don't change out from under user code. And it has the added benefit that it's restricted to a single device. --Andy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-25 14:45 ` Andy Lutomirski @ 2019-10-25 15:03 ` Willy Tarreau 2019-10-27 23:44 ` Maciej W. Rozycki 1 sibling, 0 replies; 12+ messages in thread From: Willy Tarreau @ 2019-10-25 15:03 UTC (permalink / raw) To: Andy Lutomirski; +Cc: dev, Thomas Gleixner, Peter Zijlstra, LKML On Fri, Oct 25, 2019 at 07:45:47AM -0700, Andy Lutomirski wrote: > But, for uses like DPDK, /sys/.../resource0 seems like a *far* better > API, since it actually uses the kernel's concept of which io range > corresponds to which device instead of hoping that the mappings don't > change out from under user code. And it has the added benefit that > it's restricted to a single device. For certain such uses with real device management, very likely yes. It's just that in a number of programs using hard-coded ports to access stupid devices with no driver (and often even no name), such an approach could be overkill, and these are typically the annoyingly itchy ones which could require your config entry to remain enabled. I'll add to my todo list to have a look at this as time permits. Cheers, Willy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-25 14:45 ` Andy Lutomirski 2019-10-25 15:03 ` Willy Tarreau @ 2019-10-27 23:44 ` Maciej W. Rozycki 1 sibling, 0 replies; 12+ messages in thread From: Maciej W. Rozycki @ 2019-10-27 23:44 UTC (permalink / raw) To: Andy Lutomirski; +Cc: Willy Tarreau, dev, Thomas Gleixner, Peter Zijlstra, LKML On Fri, 25 Oct 2019, Andy Lutomirski wrote: > > I'd see an API more or less like this : > > > > int ioport(int op, u16 port, long val, long *ret); > > Hmm. I have some memory of a /dev/ioport or similar, but now I can't > find it. It does seem quite reasonable. crw-r----- 1 root kmem 1, 4 Sep 9 13:58 /dev/port Maciej ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-25 6:42 ` Willy Tarreau 2019-10-25 14:45 ` Andy Lutomirski @ 2019-10-28 16:42 ` Stephen Hemminger 2019-10-28 18:00 ` Andy Lutomirski 2019-10-28 20:13 ` Willy Tarreau 1 sibling, 2 replies; 12+ messages in thread From: Stephen Hemminger @ 2019-10-28 16:42 UTC (permalink / raw) To: Willy Tarreau; +Cc: Andy Lutomirski, dev, Thomas Gleixner, Peter Zijlstra, LKML On Fri, 25 Oct 2019 08:42:25 +0200 Willy Tarreau <w@1wt.eu> wrote: > Hi Andy, > > On Thu, Oct 24, 2019 at 09:45:56PM -0700, Andy Lutomirski wrote: > > Hi all- > > > > Supporting iopl() in the Linux kernel is becoming a maintainability > > problem. As far as I know, DPDK is the only major modern user of > > iopl(). > > > > After doing some research, DPDK uses direct io port access for only a > > single purpose: accessing legacy virtio configuration structures. > > These structures are mapped in IO space in BAR 0 on legacy virtio > > devices. > > > > There are at least three ways you could avoid using iopl(). Here they > > are in rough order of quality in my opinion: > (...) > > I'm just wondering, why wouldn't we introduce a sys_ioport() syscall > to perform I/Os in the kernel without having to play at all with iopl()/ > ioperm() ? That would alleviate the need for these large port maps. > Applications that use outb/inb() usually don't need extreme speeds. > Each time I had to use them, it was to access a watchdog, a sensor, a > fan, control a front panel LED, or read/write to NVRAM. Some userland > drivers possibly don't need much more, and very likely run with > privileges turned on all the time, so replacing their inb()/outb() calls > would mostly be a matter of redefining them using a macro to use the > syscall instead. > > I'd see an API more or less like this : > > int ioport(int op, u16 port, long val, long *ret); > > <op> would take values such as INB,INW,INL to fill *<ret>, OUTB,OUTW,OUL > to read from <val>, possibly ORB,ORW,ORL to read, or with <val>, write > back and return previous value to <ret>, ANDB/W/L, XORB/W/L to do the > same with and/xor, and maybe a TEST operation to just validate support > at start time and replace ioperm/iopl so that subsequent calls do not > need to check for errors. Applications could then replace : > > ioperm() with ioport(TEST,port,0,0) > iopl() with ioport(TEST,0,0,0) > outb() with ioport(OUTB,port,val,0) > inb() with ({ char val;ioport(INB,port,0,&val);val;}) > > ... and so on. > > And then ioperm/iopl can easily be dropped. > > Maybe I'm overlooking something ? > Willy DPDK does not want to system calls. It kills performance. With pure user mode access it can reach > 10 Million Packets/sec with a system call per packet that drops to 1 Million Packets/sec. Also, adding new system calls might help in the long term, but users are often kernels that are at least 5 years behind upstream. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-28 16:42 ` Stephen Hemminger @ 2019-10-28 18:00 ` Andy Lutomirski 2019-10-28 20:13 ` Willy Tarreau 1 sibling, 0 replies; 12+ messages in thread From: Andy Lutomirski @ 2019-10-28 18:00 UTC (permalink / raw) To: Stephen Hemminger Cc: Willy Tarreau, Andy Lutomirski, dev, Thomas Gleixner, Peter Zijlstra, LKML > On Oct 28, 2019, at 10:43 AM, Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Fri, 25 Oct 2019 08:42:25 +0200 > Willy Tarreau <w@1wt.eu> wrote: > >> Hi Andy, >> >>> On Thu, Oct 24, 2019 at 09:45:56PM -0700, Andy Lutomirski wrote: >>> Hi all- >>> >>> Supporting iopl() in the Linux kernel is becoming a maintainability >>> problem. As far as I know, DPDK is the only major modern user of >>> iopl(). >>> >>> After doing some research, DPDK uses direct io port access for only a >>> single purpose: accessing legacy virtio configuration structures. >>> These structures are mapped in IO space in BAR 0 on legacy virtio >>> devices. >>> >>> There are at least three ways you could avoid using iopl(). Here they >>> are in rough order of quality in my opinion: >> (...) >> >> I'm just wondering, why wouldn't we introduce a sys_ioport() syscall >> to perform I/Os in the kernel without having to play at all with iopl()/ >> ioperm() ? That would alleviate the need for these large port maps. >> Applications that use outb/inb() usually don't need extreme speeds. >> Each time I had to use them, it was to access a watchdog, a sensor, a >> fan, control a front panel LED, or read/write to NVRAM. Some userland >> drivers possibly don't need much more, and very likely run with >> privileges turned on all the time, so replacing their inb()/outb() calls >> would mostly be a matter of redefining them using a macro to use the >> syscall instead. >> >> I'd see an API more or less like this : >> >> int ioport(int op, u16 port, long val, long *ret); >> >> <op> would take values such as INB,INW,INL to fill *<ret>, OUTB,OUTW,OUL >> to read from <val>, possibly ORB,ORW,ORL to read, or with <val>, write >> back and return previous value to <ret>, ANDB/W/L, XORB/W/L to do the >> same with and/xor, and maybe a TEST operation to just validate support >> at start time and replace ioperm/iopl so that subsequent calls do not >> need to check for errors. Applications could then replace : >> >> ioperm() with ioport(TEST,port,0,0) >> iopl() with ioport(TEST,0,0,0) >> outb() with ioport(OUTB,port,val,0) >> inb() with ({ char val;ioport(INB,port,0,&val);val;}) >> >> ... and so on. >> >> And then ioperm/iopl can easily be dropped. >> >> Maybe I'm overlooking something ? >> Willy > > DPDK does not want to system calls. It kills performance. > With pure user mode access it can reach > 10 Million Packets/sec > with a system call per packet that drops to 1 Million Packets/sec. If you are getting 10 MPPS with an OUT per packet, I’ll buy you a whole case of beer. I’m suggesting that, on virtio-legacy, you benchmark the performance hit of using a syscall to ring the doorbell. Right now, you're doing an OUT instruction that traps to the hypervisor, probably gets emulated, and goes out to whatever host-side driver is running. The cost of doing that is going to be quite high, especially on older machines. I'm guessing that adding a syscall to the mix won't make much difference. --Andy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-28 16:42 ` Stephen Hemminger 2019-10-28 18:00 ` Andy Lutomirski @ 2019-10-28 20:13 ` Willy Tarreau 1 sibling, 0 replies; 12+ messages in thread From: Willy Tarreau @ 2019-10-28 20:13 UTC (permalink / raw) To: Stephen Hemminger Cc: Andy Lutomirski, dev, Thomas Gleixner, Peter Zijlstra, LKML Hi Stephen, On Mon, Oct 28, 2019 at 09:42:53AM -0700, Stephen Hemminger wrote: (...) > > I'd see an API more or less like this : > > > > int ioport(int op, u16 port, long val, long *ret); > > > > <op> would take values such as INB,INW,INL to fill *<ret>, OUTB,OUTW,OUL > > to read from <val>, possibly ORB,ORW,ORL to read, or with <val>, write > > back and return previous value to <ret>, ANDB/W/L, XORB/W/L to do the > > same with and/xor, and maybe a TEST operation to just validate support > > at start time and replace ioperm/iopl so that subsequent calls do not > > need to check for errors. Applications could then replace : > > > > ioperm() with ioport(TEST,port,0,0) > > iopl() with ioport(TEST,0,0,0) > > outb() with ioport(OUTB,port,val,0) > > inb() with ({ char val;ioport(INB,port,0,&val);val;}) > > > > ... and so on. > > > > And then ioperm/iopl can easily be dropped. > > > > Maybe I'm overlooking something ? > > Willy > > DPDK does not want to system calls. It kills performance. > With pure user mode access it can reach > 10 Million Packets/sec > with a system call per packet that drops to 1 Million Packets/sec. I know that it would cause this on the data path, but are you *really* sure that in/out calls are performed there, because these are terribly slow already ? I'd suspect that instead it's relying on read/write of memory-mapped registers and descriptors. I really suspect that I/Os are only used for configuration purposes, which is why I proposed the stuff above (otherwise I obviously agree that syscalls in the data path are performance killers). > Also, adding new system calls might help in the long term, > but users are often kernels that are at least 5 years behind > upstream. Sure but that has never been really an issue, what matters is that backwards compatibility is long enough to let old features smoothly fade away. Some people make fun of me because I still care a bit about kernel 2.4 and openssl 0.9.7 compatibility for haproxy, so yes, I am careful about backwards compatibility and smooth upgrades ;-) Willy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-25 4:45 [dpdk-dev] Please stop using iopl() in DPDK Andy Lutomirski 2019-10-25 6:42 ` Willy Tarreau @ 2019-10-25 7:22 ` David Marchand 2019-10-25 16:13 ` Stephen Hemminger 2 siblings, 0 replies; 12+ messages in thread From: David Marchand @ 2019-10-25 7:22 UTC (permalink / raw) To: Andy Lutomirski Cc: dev, Thomas Gleixner, Peter Zijlstra, LKML, Maxime Coquelin, Tiwei Bie, Thomas Monjalon Hello Andy, On Fri, Oct 25, 2019 at 6:46 AM Andy Lutomirski <luto@kernel.org> wrote: > Supporting iopl() in the Linux kernel is becoming a maintainability > problem. As far as I know, DPDK is the only major modern user of > iopl(). Thanks for reaching out. Copying our virtio maintainers (Maxime and Tiwei), since they are the first impacted by such a change. > After doing some research, DPDK uses direct io port access for only a > single purpose: accessing legacy virtio configuration structures. > These structures are mapped in IO space in BAR 0 on legacy virtio > devices. > > There are at least three ways you could avoid using iopl(). Here they > are in rough order of quality in my opinion: > > 1. Change pci_uio_ioport_read() and pci_uio_ioport_write() to use > read() and write() on resource0 in sysfs. > > 2. Use the alternative access mechanism in the virtio legacy spec: > there is a way to access all of these structures via configuration > space. > > 3. Use ioperm() instead of iopl(). And you come with potential solutions, thanks :-) We need to look at them and evaluate what is best from our point of view. See how it impacts our ABI too (we decided on a freeze until 20.11). > We are considering changes to the kernel that will potentially harm > the performance of any program that uses iopl(3) -- in particular, > context switches will become more expensive, and the scheduler might > need to explicitly penalize such programs to ensure fairness. Using > ioperm() already hurts performance, and the proposed changes to iopl() > will make it even worse. Alternatively, the kernel could drop iopl() > support entirely. I will certainly make a change to allow > distributions to remove iopl() support entirely from their kernels, > and I expect that distributions will do this. > > Please fix DPDK. Unfortunately, we are currently closing our rc1 for the 19.11 release. Not sure who is available, but I suppose we can work on this subject in the 20.02 release timeframe. Thanks. -- David Marchand ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-25 4:45 [dpdk-dev] Please stop using iopl() in DPDK Andy Lutomirski 2019-10-25 6:42 ` Willy Tarreau 2019-10-25 7:22 ` David Marchand @ 2019-10-25 16:13 ` Stephen Hemminger 2019-10-25 20:43 ` Thomas Gleixner 2019-10-26 0:27 ` Andy Lutomirski 2 siblings, 2 replies; 12+ messages in thread From: Stephen Hemminger @ 2019-10-25 16:13 UTC (permalink / raw) To: Andy Lutomirski; +Cc: dev, Thomas Gleixner, Peter Zijlstra, LKML On Thu, 24 Oct 2019 21:45:56 -0700 Andy Lutomirski <luto@kernel.org> wrote: > Hi all- > > Supporting iopl() in the Linux kernel is becoming a maintainability > problem. As far as I know, DPDK is the only major modern user of > iopl(). > > After doing some research, DPDK uses direct io port access for only a > single purpose: accessing legacy virtio configuration structures. > These structures are mapped in IO space in BAR 0 on legacy virtio > devices. Yes. Legacy virtio seems to have been designed without consideration of how to use it in userspace. Xen, Vmware and Hyper-V all use memory as a doorbell mechanism which is easier to use from userspace. > There are at least three ways you could avoid using iopl(). Here they > are in rough order of quality in my opinion: > > 1. Change pci_uio_ioport_read() and pci_uio_ioport_write() to use > read() and write() on resource0 in sysfs. The cost of entering the kernel for a doorbell mechanism is too expensive and would kill performance. > 2. Use the alternative access mechanism in the virtio legacy spec: > there is a way to access all of these structures via configuration > space. There is no way to use memory doorbell on older versions of virtio. Users want to run DPDK on old stuff like RHEL6 and even older kernel forks. There are even use cases where virtio is used for a non-Linux host; such as GCP. > 3. Use ioperm() instead of iopl(). Ioperm has the wrong thread semantics. All DPDK applications have multiple threads and the initialization logic needs to work even if the thread is started later; threads can also be started by the user application. Iopl applies to whole process so this is not an issue. > > > We are considering changes to the kernel that will potentially harm > the performance of any program that uses iopl(3) -- in particular, > context switches will become more expensive, and the scheduler might > need to explicitly penalize such programs to ensure fairness. Using > ioperm() already hurts performance, and the proposed changes to iopl() > will make it even worse. Alternatively, the kernel could drop iopl() > support entirely. I will certainly make a change to allow > distributions to remove iopl() support entirely from their kernels, > and I expect that distributions will do this. > > Please fix DPDK. Please fix virtio. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-25 16:13 ` Stephen Hemminger @ 2019-10-25 20:43 ` Thomas Gleixner 2019-10-26 0:27 ` Andy Lutomirski 1 sibling, 0 replies; 12+ messages in thread From: Thomas Gleixner @ 2019-10-25 20:43 UTC (permalink / raw) To: Stephen Hemminger; +Cc: Andy Lutomirski, dev, Peter Zijlstra, LKML On Fri, 25 Oct 2019, Stephen Hemminger wrote: > On Thu, 24 Oct 2019 21:45:56 -0700 > Andy Lutomirski <luto@kernel.org> wrote: > > 3. Use ioperm() instead of iopl(). > > Ioperm has the wrong thread semantics. All DPDK applications have > multiple threads and the initialization logic needs to work even > if the thread is started later; threads can also be started by > the user application. > > Iopl applies to whole process so this is not an issue. No. iopl is also per thread and not per process. That has been that way forever. The man page is blantantly wrong. Both iopl and ioperm are inherited on fork. Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [dpdk-dev] Please stop using iopl() in DPDK 2019-10-25 16:13 ` Stephen Hemminger 2019-10-25 20:43 ` Thomas Gleixner @ 2019-10-26 0:27 ` Andy Lutomirski 1 sibling, 0 replies; 12+ messages in thread From: Andy Lutomirski @ 2019-10-26 0:27 UTC (permalink / raw) To: Stephen Hemminger Cc: Andy Lutomirski, dev, Thomas Gleixner, Peter Zijlstra, LKML > On Oct 25, 2019, at 9:13 AM, Stephen Hemminger <stephen@networkplumber.org> wrote: > > On Thu, 24 Oct 2019 21:45:56 -0700 > Andy Lutomirski <luto@kernel.org> wrote: > >> Hi all- >> >> Supporting iopl() in the Linux kernel is becoming a maintainability >> problem. As far as I know, DPDK is the only major modern user of >> iopl(). >> >> After doing some research, DPDK uses direct io port access for only a >> single purpose: accessing legacy virtio configuration structures. >> These structures are mapped in IO space in BAR 0 on legacy virtio >> devices. > > Yes. Legacy virtio seems to have been designed without consideration > of how to use it in userspace. Xen, Vmware and Hyper-V all use memory > as a doorbell mechanism which is easier to use from userspace. > > >> There are at least three ways you could avoid using iopl(). Here they >> are in rough order of quality in my opinion: >> >> 1. Change pci_uio_ioport_read() and pci_uio_ioport_write() to use >> read() and write() on resource0 in sysfs. > > The cost of entering the kernel for a doorbell mechanism is too > expensive and would kill performance. > > >> 2. Use the alternative access mechanism in the virtio legacy spec: >> there is a way to access all of these structures via configuration >> space. > > There is no way to use memory doorbell on older versions of virtio. > Users want to run DPDK on old stuff like RHEL6 and even older > kernel forks. There are even use cases where virtio is used for > a non-Linux host; such as GCP. > > >> 3. Use ioperm() instead of iopl(). > > Ioperm has the wrong thread semantics. All DPDK applications have > multiple threads and the initialization logic needs to work even > if the thread is started later; threads can also be started by > the user application. > > Iopl applies to whole process so this is not an issue. This is not true. ioperm() and iopl() have identical thread semantics. I think what you’re seeing is that you can set iopl(3) early without knowing which port range to request. You could alternatively set ioperm() early and ask for a very wide range. In principle, we could make ioperm() be per thread, but I’m not sure we should add that kind of complexity to support a mostly obsolete use case like this. There's actually an argument to be made that per-mm ioperm would be easier to handle in the kernel than per-task due to the vagaries of KPTI. All this being said, what are the actual performance implications of write() to /sys/.../resource0? Off the top of my head, I would guess that the actual OUTB or OUTL instruction itself is incredibly slow due to being trapped and emulated and that virtio-legacy hypervisors aren't particularly fast to begin with and that, as a result, the write() might not actually matter that much. > >> >> >> We are considering changes to the kernel that will potentially harm >> the performance of any program that uses iopl(3) -- in particular, >> context switches will become more expensive, and the scheduler might >> need to explicitly penalize such programs to ensure fairness. Using >> ioperm() already hurts performance, and the proposed changes to iopl() >> will make it even worse. Alternatively, the kernel could drop iopl() >> support entirely. I will certainly make a change to allow >> distributions to remove iopl() support entirely from their kernels, >> and I expect that distributions will do this. >> >> Please fix DPDK. > > Please fix virtio. Done, with the new version of virtio :) ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-10-30 10:36 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-25 4:45 [dpdk-dev] Please stop using iopl() in DPDK Andy Lutomirski 2019-10-25 6:42 ` Willy Tarreau 2019-10-25 14:45 ` Andy Lutomirski 2019-10-25 15:03 ` Willy Tarreau 2019-10-27 23:44 ` Maciej W. Rozycki 2019-10-28 16:42 ` Stephen Hemminger 2019-10-28 18:00 ` Andy Lutomirski 2019-10-28 20:13 ` Willy Tarreau 2019-10-25 7:22 ` David Marchand 2019-10-25 16:13 ` Stephen Hemminger 2019-10-25 20:43 ` Thomas Gleixner 2019-10-26 0:27 ` Andy Lutomirski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).