* [dpdk-dev] Windows Support Plan
@ 2020-02-02 20:37 Dmitry Kozliuk
2020-02-03 9:15 ` [dpdk-dev] [EXTERNAL] " Stephen Hemminger
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Dmitry Kozliuk @ 2020-02-02 20:37 UTC (permalink / raw)
To: dev
Cc: Thomas Monjalon, Pallavi Kadam, Anatoly Burakov, Ranjit Menon,
Harini.Ramakrishnan, Stephen Hemminger
Hi everyone!
Where do I find a high-level plan of comprehensive Windows support: design
decisions, implementation order, etc?
Information on the subject is very scarce, one may think it is abandoned.
Googling for "site:dpdk.org/ml/archives/dev/ windows" yields only two pages
of disjoint messages. I learned about "netuio" days ago from a tiny remark in
a "Minutes of Technical Board Meetings" email, and even then it took
enumerating "dpdk-next-windows" branches to find the source.
The matter is, as a New Year's holiday project of mine I implemented Windows
support from scratch to the point it runs in QEMU with virtio-pci [0]. It is
not of production quality, cuts some corners and lacks major features (see
bottom). My primary goal was fun^W making it work. Comparing it to
"windpdk-v18.08" branch of "dpdk-next-windows", I can see that 1) our
implementations take rather different approaches in some cases, and 2) both
have severe issues and would benefit from amalgamation. I'd like to
contribute to Windows support with this code, but to do so, coordination is
required, because changes are significant.
Primary topics to discuss:
1. Memory management (@Anatoly)
1.1. MM changed radically since v18.08 and dpdk-next-windows does not
implement it properly anyway, it allocates segment lists in a PCI bus
driver. My implementation closely follows the Linux one using
VirtualAlloc2() with XXX_PLACEHOLDER flags to reserve and commit
memory, but does not map hugepages to files. Is there
a consensus on MM approach in Windows?
Anyway, I think EAL private MM API would have to be changed,
because memory reservation, allocation, and mapping are
completely different operations. Hiding this with an mmap() shim
doesn't look right, because mmap()'s behavior differs even among
Unix platforms.
1.2. In Windows, there is no /dev/mem to implement rte_virt2iova(),
so a simple kernel driver is required for mapping. Moreover,
Windows kernel abstracts IOMMU, so those physical addresses may
be unsuitable for DMA at all (see below).
2. Kernel drivers (@Harini, @Stephen)
2.1. The most serious issue is that Windows formally prohibits using
arbitrary physical addresses with DMA in favor of allocating
special buffers (presumably because IOMMU may be engaged, and
there is no way to check). We can either live with it
(technically, everything works with PA mode), or we could revive
DMA allocation API from ethdev to ask the driver for a proper
DMA buffer.
2.2. Neither netuio, nor my driver (userpci) support interrupts.
I see not inherent difficulty here, but interface should be
designed carefully.
2.3. Windows allows mapping I/O ports into user-space, but there is
no API to change IOPL, which makes mapping useless and requires
a syscall for every I/O port access. This demolishes
virtio-legacy performance. Perhaps Microsoft could give some
advice here. OTOH, PIO is all legacy, so might be much effort is
not justified.
2.4. I believe GUIDs approach for identifying compatible devices
should be strictly preferred, and not DosDevices symlinks. Think
of Mellanox OFED on Linux, which uses a different driver, but
could provide a compatible interface. Another reason is that
a single driver can implement multiple kernel interfaces with
appropriate GUIDs.
2.5. DPDK Windows driver guidelines, driver review, and certification.
The quality of both netuio and userpci is below standards now
(e. g. netuio does not mind its context when mapping memory,
and userpci lacks synchronization), code style is a mix of
Windows and DPDK, logging may be insufficient.
3. POSIX shim vs EAL wrappers (@Thomas, @Pallavi, @Ranjit)
What is the policy: to implement a POSIX shim in EAL (as the latest
patches from Pallavi Kadam do), or to add dependencies (as [1] suggests)?
IMO creating a shim is wrong. First, some POSIX concepts do not
easily map to Windows, like poll() interface and I/O model in
general. Second, there are numerous getopt, pthread, etc.
implementations for Windows, no point wasting resources and repeat
them, adding bugs. I can think of two exceptions:
* <sys/queue.h>, which is header-only.
* Berkeley sockets. Adding <winsock2.h> to public headers creates
more trouble that its worth: definitions for a few structures and
constants. May be there should be some <rte_socket.h> to abstract
platform differences.
Some highlights on my implementation:
* Major features NOT supported:
* multi-process (due to limited time)
* interrupts (limited time + explained above)
* eventdev (requires access to physical memory)
* hot-plug (due to limited time and Windows knowledge)
* bbdev (see comments in config/common_windows)
* FreeBSD (trivial, I just don't use it)
* DPDK is built using MinGW-w64 with GNUmake or Meson.
Drivers are built using DDK (msbuild or Visual Studio).
Actually, I cross-compile DPDK and build drivers natively.
* Only tested on Windows 10 in QEMU with virtio-legacy.
* No docs, but there's nothing unusual for those familiar with Windows.
Bind virt2phys driver to Root\virt2phys, bind userpci driver to device(s).
* Commit history is squashed, because it was a mess from experiments.
There also may be some leftover changes, but those commits are not proper
patches anyway.
References:
[0]: https://github.com/PlushBeaver/dpdk/commits/windows
[1]: http://mails.dpdk.org/archives/dev/2015-February/014245.html
--
Dmitry Kozlyuk
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [EXTERNAL] Windows Support Plan
2020-02-02 20:37 [dpdk-dev] Windows Support Plan Dmitry Kozliuk
@ 2020-02-03 9:15 ` Stephen Hemminger
2020-02-03 18:18 ` Menon, Ranjit
2020-02-03 10:25 ` [dpdk-dev] " Burakov, Anatoly
2020-02-05 1:03 ` Thomas Monjalon
2 siblings, 1 reply; 8+ messages in thread
From: Stephen Hemminger @ 2020-02-03 9:15 UTC (permalink / raw)
To: Dmitry Kozliuk, dev
Cc: Thomas Monjalon, Pallavi Kadam, Anatoly Burakov, Ranjit Menon,
Harini Ramakrishnan
You should talk to the Windows DPDK developers.
They have been presenting regularly at dpdk summits. Look up videos for more info.
The initial port is focused on running DPDK on bare metal with Intel NIC. Your version looks more aligned with Windows as guest in KVM.
Get Outlook for Android<https://aka.ms/ghei36>
________________________________
From: Dmitry Kozliuk <dmitry.kozliuk@gmail.com>
Sent: Sunday, February 2, 2020 9:37:36 PM
To: dev@dpdk.org <dev@dpdk.org>
Cc: Thomas Monjalon <thomas@monjalon.net>; Pallavi Kadam <pallavi.kadam@intel.com>; Anatoly Burakov <anatoly.burakov@intel.com>; Ranjit Menon <ranjit.menon@intel.com>; Harini Ramakrishnan <Harini.Ramakrishnan@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>
Subject: [EXTERNAL] Windows Support Plan
Hi everyone!
Where do I find a high-level plan of comprehensive Windows support: design
decisions, implementation order, etc?
Information on the subject is very scarce, one may think it is abandoned.
Googling for "site:dpdk.org/ml/archives/dev/ windows" yields only two pages
of disjoint messages. I learned about "netuio" days ago from a tiny remark in
a "Minutes of Technical Board Meetings" email, and even then it took
enumerating "dpdk-next-windows" branches to find the source.
The matter is, as a New Year's holiday project of mine I implemented Windows
support from scratch to the point it runs in QEMU with virtio-pci [0]. It is
not of production quality, cuts some corners and lacks major features (see
bottom). My primary goal was fun^W making it work. Comparing it to
"windpdk-v18.08" branch of "dpdk-next-windows", I can see that 1) our
implementations take rather different approaches in some cases, and 2) both
have severe issues and would benefit from amalgamation. I'd like to
contribute to Windows support with this code, but to do so, coordination is
required, because changes are significant.
Primary topics to discuss:
1. Memory management (@Anatoly)
1.1. MM changed radically since v18.08 and dpdk-next-windows does not
implement it properly anyway, it allocates segment lists in a PCI bus
driver. My implementation closely follows the Linux one using
VirtualAlloc2() with XXX_PLACEHOLDER flags to reserve and commit
memory, but does not map hugepages to files. Is there
a consensus on MM approach in Windows?
Anyway, I think EAL private MM API would have to be changed,
because memory reservation, allocation, and mapping are
completely different operations. Hiding this with an mmap() shim
doesn't look right, because mmap()'s behavior differs even among
Unix platforms.
1.2. In Windows, there is no /dev/mem to implement rte_virt2iova(),
so a simple kernel driver is required for mapping. Moreover,
Windows kernel abstracts IOMMU, so those physical addresses may
be unsuitable for DMA at all (see below).
2. Kernel drivers (@Harini, @Stephen)
2.1. The most serious issue is that Windows formally prohibits using
arbitrary physical addresses with DMA in favor of allocating
special buffers (presumably because IOMMU may be engaged, and
there is no way to check). We can either live with it
(technically, everything works with PA mode), or we could revive
DMA allocation API from ethdev to ask the driver for a proper
DMA buffer.
2.2. Neither netuio, nor my driver (userpci) support interrupts.
I see not inherent difficulty here, but interface should be
designed carefully.
2.3. Windows allows mapping I/O ports into user-space, but there is
no API to change IOPL, which makes mapping useless and requires
a syscall for every I/O port access. This demolishes
virtio-legacy performance. Perhaps Microsoft could give some
advice here. OTOH, PIO is all legacy, so might be much effort is
not justified.
2.4. I believe GUIDs approach for identifying compatible devices
should be strictly preferred, and not DosDevices symlinks. Think
of Mellanox OFED on Linux, which uses a different driver, but
could provide a compatible interface. Another reason is that
a single driver can implement multiple kernel interfaces with
appropriate GUIDs.
2.5. DPDK Windows driver guidelines, driver review, and certification.
The quality of both netuio and userpci is below standards now
(e. g. netuio does not mind its context when mapping memory,
and userpci lacks synchronization), code style is a mix of
Windows and DPDK, logging may be insufficient.
3. POSIX shim vs EAL wrappers (@Thomas, @Pallavi, @Ranjit)
What is the policy: to implement a POSIX shim in EAL (as the latest
patches from Pallavi Kadam do), or to add dependencies (as [1] suggests)?
IMO creating a shim is wrong. First, some POSIX concepts do not
easily map to Windows, like poll() interface and I/O model in
general. Second, there are numerous getopt, pthread, etc.
implementations for Windows, no point wasting resources and repeat
them, adding bugs. I can think of two exceptions:
* <sys/queue.h>, which is header-only.
* Berkeley sockets. Adding <winsock2.h> to public headers creates
more trouble that its worth: definitions for a few structures and
constants. May be there should be some <rte_socket.h> to abstract
platform differences.
Some highlights on my implementation:
* Major features NOT supported:
* multi-process (due to limited time)
* interrupts (limited time + explained above)
* eventdev (requires access to physical memory)
* hot-plug (due to limited time and Windows knowledge)
* bbdev (see comments in config/common_windows)
* FreeBSD (trivial, I just don't use it)
* DPDK is built using MinGW-w64 with GNUmake or Meson.
Drivers are built using DDK (msbuild or Visual Studio).
Actually, I cross-compile DPDK and build drivers natively.
* Only tested on Windows 10 in QEMU with virtio-legacy.
* No docs, but there's nothing unusual for those familiar with Windows.
Bind virt2phys driver to Root\virt2phys, bind userpci driver to device(s).
* Commit history is squashed, because it was a mess from experiments.
There also may be some leftover changes, but those commits are not proper
patches anyway.
References:
[0]: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPlushBeaver%2Fdpdk%2Fcommits%2Fwindows&data=02%7C01%7Csthemmin%40microsoft.com%7C485559de220c43a1fe2408d7a81fd5e9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637162727454625299&sdata=W%2BrqF4EWaBmwEOb7t3fRrKfmu7GkHpIyNJ2us6Dx6QU%3D&reserved=0
[1]: https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2015-February%2F014245.html&data=02%7C01%7Csthemmin%40microsoft.com%7C485559de220c43a1fe2408d7a81fd5e9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637162727454625299&sdata=Hb%2FCD99bjzhDlfrcbKdBN%2FlFkqQyN3F%2BvYlPl1VIz8w%3D&reserved=0
--
Dmitry Kozlyuk
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [EXTERNAL] Windows Support Plan
2020-02-03 9:15 ` [dpdk-dev] [EXTERNAL] " Stephen Hemminger
@ 2020-02-03 18:18 ` Menon, Ranjit
2020-02-03 22:13 ` Dmitry Kozlyuk
0 siblings, 1 reply; 8+ messages in thread
From: Menon, Ranjit @ 2020-02-03 18:18 UTC (permalink / raw)
To: Stephen Hemminger, Dmitry Kozliuk, dev
Cc: Thomas Monjalon, Kadam, Pallavi, Burakov, Anatoly, Harini Ramakrishnan
Dmitry...
There is a DPDK Windows community meeting every second Wednesday at 8:00am (Pacific Time).
If this time works for you, we can have Harini add you to this meeting series.
thanks,
ranjit m.
From: Stephen Hemminger <sthemmin@microsoft.com>
Sent: Monday, February 3, 2020 1:16 AM
To: Dmitry Kozliuk <dmitry.kozliuk@gmail.com>; dev@dpdk.org
Cc: Thomas Monjalon <thomas@monjalon.net>; Kadam, Pallavi <pallavi.kadam@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Menon, Ranjit <ranjit.menon@intel.com>; Harini Ramakrishnan <Harini.Ramakrishnan@microsoft.com>
Subject: Re: [EXTERNAL] Windows Support Plan
You should talk to the Windows DPDK developers.
They have been presenting regularly at dpdk summits. Look up videos for more info.
The initial port is focused on running DPDK on bare metal with Intel NIC. Your version looks more aligned with Windows as guest in KVM.
Get Outlook for Android<https://aka.ms/ghei36>
________________________________
From: Dmitry Kozliuk <dmitry.kozliuk@gmail.com<mailto:dmitry.kozliuk@gmail.com>>
Sent: Sunday, February 2, 2020 9:37:36 PM
To: dev@dpdk.org<mailto:dev@dpdk.org> <dev@dpdk.org<mailto:dev@dpdk.org>>
Cc: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; Pallavi Kadam <pallavi.kadam@intel.com<mailto:pallavi.kadam@intel.com>>; Anatoly Burakov <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Ranjit Menon <ranjit.menon@intel.com<mailto:ranjit.menon@intel.com>>; Harini Ramakrishnan <Harini.Ramakrishnan@microsoft.com<mailto:Harini.Ramakrishnan@microsoft.com>>; Stephen Hemminger <sthemmin@microsoft.com<mailto:sthemmin@microsoft.com>>
Subject: [EXTERNAL] Windows Support Plan
Hi everyone!
Where do I find a high-level plan of comprehensive Windows support: design
decisions, implementation order, etc?
Information on the subject is very scarce, one may think it is abandoned.
Googling for "site:dpdk.org/ml/archives/dev/ windows" yields only two pages
of disjoint messages. I learned about "netuio" days ago from a tiny remark in
a "Minutes of Technical Board Meetings" email, and even then it took
enumerating "dpdk-next-windows" branches to find the source.
The matter is, as a New Year's holiday project of mine I implemented Windows
support from scratch to the point it runs in QEMU with virtio-pci [0]. It is
not of production quality, cuts some corners and lacks major features (see
bottom). My primary goal was fun^W making it work. Comparing it to
"windpdk-v18.08" branch of "dpdk-next-windows", I can see that 1) our
implementations take rather different approaches in some cases, and 2) both
have severe issues and would benefit from amalgamation. I'd like to
contribute to Windows support with this code, but to do so, coordination is
required, because changes are significant.
Primary topics to discuss:
1. Memory management (@Anatoly)
1.1. MM changed radically since v18.08 and dpdk-next-windows does not
implement it properly anyway, it allocates segment lists in a PCI bus
driver. My implementation closely follows the Linux one using
VirtualAlloc2() with XXX_PLACEHOLDER flags to reserve and commit
memory, but does not map hugepages to files. Is there
a consensus on MM approach in Windows?
Anyway, I think EAL private MM API would have to be changed,
because memory reservation, allocation, and mapping are
completely different operations. Hiding this with an mmap() shim
doesn't look right, because mmap()'s behavior differs even among
Unix platforms.
1.2. In Windows, there is no /dev/mem to implement rte_virt2iova(),
so a simple kernel driver is required for mapping. Moreover,
Windows kernel abstracts IOMMU, so those physical addresses may
be unsuitable for DMA at all (see below).
2. Kernel drivers (@Harini, @Stephen)
2.1. The most serious issue is that Windows formally prohibits using
arbitrary physical addresses with DMA in favor of allocating
special buffers (presumably because IOMMU may be engaged, and
there is no way to check). We can either live with it
(technically, everything works with PA mode), or we could revive
DMA allocation API from ethdev to ask the driver for a proper
DMA buffer.
2.2. Neither netuio, nor my driver (userpci) support interrupts.
I see not inherent difficulty here, but interface should be
designed carefully.
2.3. Windows allows mapping I/O ports into user-space, but there is
no API to change IOPL, which makes mapping useless and requires
a syscall for every I/O port access. This demolishes
virtio-legacy performance. Perhaps Microsoft could give some
advice here. OTOH, PIO is all legacy, so might be much effort is
not justified.
2.4. I believe GUIDs approach for identifying compatible devices
should be strictly preferred, and not DosDevices symlinks. Think
of Mellanox OFED on Linux, which uses a different driver, but
could provide a compatible interface. Another reason is that
a single driver can implement multiple kernel interfaces with
appropriate GUIDs.
2.5. DPDK Windows driver guidelines, driver review, and certification.
The quality of both netuio and userpci is below standards now
(e. g. netuio does not mind its context when mapping memory,
and userpci lacks synchronization), code style is a mix of
Windows and DPDK, logging may be insufficient.
3. POSIX shim vs EAL wrappers (@Thomas, @Pallavi, @Ranjit)
What is the policy: to implement a POSIX shim in EAL (as the latest
patches from Pallavi Kadam do), or to add dependencies (as [1] suggests)?
IMO creating a shim is wrong. First, some POSIX concepts do not
easily map to Windows, like poll() interface and I/O model in
general. Second, there are numerous getopt, pthread, etc.
implementations for Windows, no point wasting resources and repeat
them, adding bugs. I can think of two exceptions:
* <sys/queue.h>, which is header-only.
* Berkeley sockets. Adding <winsock2.h> to public headers creates
more trouble that its worth: definitions for a few structures and
constants. May be there should be some <rte_socket.h> to abstract
platform differences.
Some highlights on my implementation:
* Major features NOT supported:
* multi-process (due to limited time)
* interrupts (limited time + explained above)
* eventdev (requires access to physical memory)
* hot-plug (due to limited time and Windows knowledge)
* bbdev (see comments in config/common_windows)
* FreeBSD (trivial, I just don't use it)
* DPDK is built using MinGW-w64 with GNUmake or Meson.
Drivers are built using DDK (msbuild or Visual Studio).
Actually, I cross-compile DPDK and build drivers natively.
* Only tested on Windows 10 in QEMU with virtio-legacy.
* No docs, but there's nothing unusual for those familiar with Windows.
Bind virt2phys driver to Root\virt2phys, bind userpci driver to device(s).
* Commit history is squashed, because it was a mess from experiments.
There also may be some leftover changes, but those commits are not proper
patches anyway.
References:
[0]: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPlushBeaver%2Fdpdk%2Fcommits%2Fwindows&data=02%7C01%7Csthemmin%40microsoft.com%7C485559de220c43a1fe2408d7a81fd5e9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637162727454625299&sdata=W%2BrqF4EWaBmwEOb7t3fRrKfmu7GkHpIyNJ2us6Dx6QU%3D&reserved=0
[1]: https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2015-February%2F014245.html&data=02%7C01%7Csthemmin%40microsoft.com%7C485559de220c43a1fe2408d7a81fd5e9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637162727454625299&sdata=Hb%2FCD99bjzhDlfrcbKdBN%2FlFkqQyN3F%2BvYlPl1VIz8w%3D&reserved=0
--
Dmitry Kozlyuk
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] [EXTERNAL] Windows Support Plan
2020-02-03 18:18 ` Menon, Ranjit
@ 2020-02-03 22:13 ` Dmitry Kozlyuk
0 siblings, 0 replies; 8+ messages in thread
From: Dmitry Kozlyuk @ 2020-02-03 22:13 UTC (permalink / raw)
To: Menon, Ranjit
Cc: Stephen Hemminger, dev, Thomas Monjalon, Kadam, Pallavi, Burakov,
Anatoly, Harini Ramakrishnan
> If this time works for you, we can have Harini add you to this meeting series.
Thanks Ranjit, Harini's sent me an invitation for this Wednesday.
--
Dmitry Kozlyuk
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] Windows Support Plan
2020-02-02 20:37 [dpdk-dev] Windows Support Plan Dmitry Kozliuk
2020-02-03 9:15 ` [dpdk-dev] [EXTERNAL] " Stephen Hemminger
@ 2020-02-03 10:25 ` Burakov, Anatoly
2020-02-08 20:09 ` Dmitry Kozlyuk
2020-02-05 1:03 ` Thomas Monjalon
2 siblings, 1 reply; 8+ messages in thread
From: Burakov, Anatoly @ 2020-02-03 10:25 UTC (permalink / raw)
To: Dmitry Kozliuk, dev
Cc: Thomas Monjalon, Pallavi Kadam, Ranjit Menon,
Harini.Ramakrishnan, Stephen Hemminger
On 02-Feb-20 8:37 PM, Dmitry Kozliuk wrote:
> Hi everyone!
>
Hi,
> Primary topics to discuss:
>
> 1. Memory management (@Anatoly)
>
> 1.1. MM changed radically since v18.08 and dpdk-next-windows does not
> implement it properly anyway, it allocates segment lists in a PCI bus
> driver. My implementation closely follows the Linux one using
> VirtualAlloc2() with XXX_PLACEHOLDER flags to reserve and commit
> memory, but does not map hugepages to files. Is there
> a consensus on MM approach in Windows?
>
> Anyway, I think EAL private MM API would have to be changed,
> because memory reservation, allocation, and mapping are
> completely different operations. Hiding this with an mmap() shim
> doesn't look right, because mmap()'s behavior differs even among
> Unix platforms.
>
> 1.2. In Windows, there is no /dev/mem to implement rte_virt2iova(),
> so a simple kernel driver is required for mapping. Moreover,
> Windows kernel abstracts IOMMU, so those physical addresses may
> be unsuitable for DMA at all (see below).
>
I haven't really been following the Windows port much so i have no idea
of how it works for now.
The main reason DPDK memory management works the way it does is because
of need to support multiprocess. In order to map memory in all
processes, we need that space reserved (otherwise there's no guarantee
that the newly mapped memory segment will be mapped in all processes,
and it'll cause runtime failure). If it wasn't for that, we could
allocate memory arbitrarily and as needed. Windows should either follow
this model, or drop secondary support and go its own way - the internals
are OS-specific anyway.
If there are changes needed to private memalloc API to support the
above, that's completely fine - that's why all of this stuff is
internal-only :) As long as public API stays roughly the same, we should
be good. Bear in mind that DPDK also supports external memory, you might
need to make some allowances for that too.
As for IOMMU - we don't support IOVA as VA addressing on FreeBSD, so if
Windows port can only work with IOVA as PA, that's fine too. The
question of IOVA mode really boils down to, do we control the DMA
addresses (IOVA as VA mode), or does the system (IOVA as PA). I'm not
familiar with how IOMMU works on Windows, but as long as it fits into
that model and we keep the API, it should also be OK :)
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] Windows Support Plan
2020-02-03 10:25 ` [dpdk-dev] " Burakov, Anatoly
@ 2020-02-08 20:09 ` Dmitry Kozlyuk
2020-02-11 10:05 ` Burakov, Anatoly
0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Kozlyuk @ 2020-02-08 20:09 UTC (permalink / raw)
To: Burakov, Anatoly
Cc: dev, Thomas Monjalon, Pallavi Kadam, Ranjit Menon,
Harini.Ramakrishnan, Stephen Hemminger
> The main reason DPDK memory management works the way it does is because
> of need to support multiprocess. In order to map memory in all
> processes, we need that space reserved (otherwise there's no guarantee
> that the newly mapped memory segment will be mapped in all processes,
> and it'll cause runtime failure). If it wasn't for that, we could
> allocate memory arbitrarily and as needed. Windows should either follow
> this model, or drop secondary support and go its own way - the internals
> are OS-specific anyway.
I think Windows should support multi-process, because there is a demand and
an ongoing design effort for multi-tenancy and resource arbitration [0].
Until Windows kernel implements "secure API" for the architecture proposed by
[0] (if it does at all), DPDK multi-process model can to some point support
the features desired. For example, a primary process may be a service
performing resource arbitration for applications being secondary processes.
> Bear in mind that DPDK also supports external memory, you might
> need to make some allowances for that too.
I haven't considered external memory yet. Does it need anything beyond
mapping VA to IOVA?
> As for IOMMU - we don't support IOVA as VA addressing on FreeBSD, so if
> Windows port can only work with IOVA as PA, that's fine too. The
> question of IOVA mode really boils down to, do we control the DMA
> addresses (IOVA as VA mode), or does the system (IOVA as PA). I'm not
> familiar with how IOMMU works on Windows, but as long as it fits into
> that model and we keep the API, it should also be OK :)
AFAIK, Windows doesn't expose IOMMU either to applications or drivers. Do I
understand correctly that implies only IOVA as PA can be supported, because
mappings can't be set up?
The trouble is, PA cannot generally be used if IOMMU is present, but there
is no way to tell if it is. Windows kernel offers API to allocate buffers for
DMA [1], but MM doesn't know if it allocates memory for DMA or not, even if
that kernel API would be exposed. If I got it right, DPDK just can't be used
on Windows with IOMMU enabled (can't tell for VMs that don't see IOMMU).
[0]:
https://www.dpdk.org/wp-content/uploads/sites/35/2018/12/RMenonOCardona_Improving-Security-in-Windows-DPDK.pdf
[1]:
https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-pallocate_common_buffer_ex
--
Dmitry Kozlyuk
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] Windows Support Plan
2020-02-08 20:09 ` Dmitry Kozlyuk
@ 2020-02-11 10:05 ` Burakov, Anatoly
0 siblings, 0 replies; 8+ messages in thread
From: Burakov, Anatoly @ 2020-02-11 10:05 UTC (permalink / raw)
To: Dmitry Kozlyuk
Cc: dev, Thomas Monjalon, Pallavi Kadam, Ranjit Menon,
Harini.Ramakrishnan, Stephen Hemminger
On 08-Feb-20 8:09 PM, Dmitry Kozlyuk wrote:
>> The main reason DPDK memory management works the way it does is because
>> of need to support multiprocess. In order to map memory in all
>> processes, we need that space reserved (otherwise there's no guarantee
>> that the newly mapped memory segment will be mapped in all processes,
>> and it'll cause runtime failure). If it wasn't for that, we could
>> allocate memory arbitrarily and as needed. Windows should either follow
>> this model, or drop secondary support and go its own way - the internals
>> are OS-specific anyway.
>
> I think Windows should support multi-process, because there is a demand and
> an ongoing design effort for multi-tenancy and resource arbitration [0].
> Until Windows kernel implements "secure API" for the architecture proposed by
> [0] (if it does at all), DPDK multi-process model can to some point support
> the features desired. For example, a primary process may be a service
> performing resource arbitration for applications being secondary processes.
>
>> Bear in mind that DPDK also supports external memory, you might
>> need to make some allowances for that too.
>
> I haven't considered external memory yet. Does it need anything beyond
> mapping VA to IOVA?
No, just a few checks here and there. The brunt of the mapping is on the
shoulders of the user, and "external memory API" is really just
registering this memory with DPDK so that calls like rte_virt2memseg()
work correctly.
>
>> As for IOMMU - we don't support IOVA as VA addressing on FreeBSD, so if
>> Windows port can only work with IOVA as PA, that's fine too. The
>> question of IOVA mode really boils down to, do we control the DMA
>> addresses (IOVA as VA mode), or does the system (IOVA as PA). I'm not
>> familiar with how IOMMU works on Windows, but as long as it fits into
>> that model and we keep the API, it should also be OK :)
>
> AFAIK, Windows doesn't expose IOMMU either to applications or drivers. Do I
> understand correctly that implies only IOVA as PA can be supported, because
> mappings can't be set up?
In general, yes. The name probably doesn't match very well, admittedly,
but that's the equivalent as far as software is concerned. That's not a
problem - FreeBSD doesn't support IOVA as VA either :)
>
> The trouble is, PA cannot generally be used if IOMMU is present, but there
> is no way to tell if it is. Windows kernel offers API to allocate buffers for
> DMA [1], but MM doesn't know if it allocates memory for DMA or not, even if
> that kernel API would be exposed. If I got it right, DPDK just can't be used
> on Windows with IOMMU enabled (can't tell for VMs that don't see IOMMU).
>
> [0]:
> https://www.dpdk.org/wp-content/uploads/sites/35/2018/12/RMenonOCardona_Improving-Security-in-Windows-DPDK.pdf
> [1]:
> https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nc-wdm-pallocate_common_buffer_ex
>
--
Thanks,
Anatoly
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [dpdk-dev] Windows Support Plan
2020-02-02 20:37 [dpdk-dev] Windows Support Plan Dmitry Kozliuk
2020-02-03 9:15 ` [dpdk-dev] [EXTERNAL] " Stephen Hemminger
2020-02-03 10:25 ` [dpdk-dev] " Burakov, Anatoly
@ 2020-02-05 1:03 ` Thomas Monjalon
2 siblings, 0 replies; 8+ messages in thread
From: Thomas Monjalon @ 2020-02-05 1:03 UTC (permalink / raw)
To: Dmitry Kozliuk
Cc: dev, Pallavi Kadam, Anatoly Burakov, Ranjit Menon,
Harini.Ramakrishnan, Stephen Hemminger
02/02/2020 21:37, Dmitry Kozliuk:
> Where do I find a high-level plan of comprehensive Windows support: design
> decisions, implementation order, etc?
Please help documenting design decisions in the DPDK doc.
For implementation order, we'll discuss it soon together.
> Information on the subject is very scarce, one may think it is abandoned.
> Googling for "site:dpdk.org/ml/archives/dev/ windows" yields only two pages
> of disjoint messages. I learned about "netuio" days ago from a tiny remark in
> a "Minutes of Technical Board Meetings" email, and even then it took
> enumerating "dpdk-next-windows" branches to find the source.
I agree.
I think Harini will address this lack of information.
> The matter is, as a New Year's holiday project of mine I implemented Windows
> support from scratch to the point it runs in QEMU with virtio-pci [0]. It is
> not of production quality, cuts some corners and lacks major features (see
> bottom). My primary goal was fun^W making it work. Comparing it to
> "windpdk-v18.08" branch of "dpdk-next-windows", I can see that 1) our
> implementations take rather different approaches in some cases, and 2) both
> have severe issues and would benefit from amalgamation. I'd like to
> contribute to Windows support with this code, but to do so, coordination is
> required, because changes are significant.
You are very welcome.
The work you already did looks amazing and it is very well presented.
[...]
> 3. POSIX shim vs EAL wrappers (@Thomas, @Pallavi, @Ranjit)
>
> What is the policy: to implement a POSIX shim in EAL (as the latest
> patches from Pallavi Kadam do), or to add dependencies (as [1] suggests)?
You are right, we should think about adding new dependencies which are
easily and generally available.
> IMO creating a shim is wrong.
I do not like the shim layer either.
> First, some POSIX concepts do not
> easily map to Windows, like poll() interface and I/O model in
> general. Second, there are numerous getopt, pthread, etc.
> implementations for Windows, no point wasting resources and repeat
> them, adding bugs. I can think of two exceptions:
>
> * <sys/queue.h>, which is header-only.
>
> * Berkeley sockets. Adding <winsock2.h> to public headers creates
> more trouble that its worth: definitions for a few structures and
> constants. May be there should be some <rte_socket.h> to abstract
> platform differences.
[...]
> * multi-process (due to limited time)
As Anatoly said, multi-process is not a priority.
This feature has a high cost, so we should think twice before deciding
to support it on Windows.
[...]
> [0]: https://github.com/PlushBeaver/dpdk/commits/windows
> [1]: http://mails.dpdk.org/archives/dev/2015-February/014245.html
Thanks a lot
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-02-11 10:05 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-02 20:37 [dpdk-dev] Windows Support Plan Dmitry Kozliuk
2020-02-03 9:15 ` [dpdk-dev] [EXTERNAL] " Stephen Hemminger
2020-02-03 18:18 ` Menon, Ranjit
2020-02-03 22:13 ` Dmitry Kozlyuk
2020-02-03 10:25 ` [dpdk-dev] " Burakov, Anatoly
2020-02-08 20:09 ` Dmitry Kozlyuk
2020-02-11 10:05 ` Burakov, Anatoly
2020-02-05 1:03 ` Thomas Monjalon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).