DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Menon, Ranjit" <ranjit.menon@intel.com>
To: Stephen Hemminger <sthemmin@microsoft.com>,
	Dmitry Kozliuk <dmitry.kozliuk@gmail.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: Thomas Monjalon <thomas@monjalon.net>,
	"Kadam, Pallavi" <pallavi.kadam@intel.com>,
	"Burakov, Anatoly" <anatoly.burakov@intel.com>,
	Harini Ramakrishnan <Harini.Ramakrishnan@microsoft.com>
Subject: Re: [dpdk-dev] [EXTERNAL] Windows Support Plan
Date: Mon, 3 Feb 2020 18:18:59 +0000	[thread overview]
Message-ID: <7603DC8746F9FC4D82EF0929C467267A737C5A1E@ORSMSX112.amr.corp.intel.com> (raw)
In-Reply-To: <BN6PR21MB0836B8F7F26710CB2F3508ECCC000@BN6PR21MB0836.namprd21.prod.outlook.com>

Dmitry...
There is a DPDK Windows community meeting every second Wednesday at 8:00am (Pacific Time).
If this time works for you, we can have Harini add you to this meeting series.
thanks,

ranjit m.

From: Stephen Hemminger <sthemmin@microsoft.com>
Sent: Monday, February 3, 2020 1:16 AM
To: Dmitry Kozliuk <dmitry.kozliuk@gmail.com>; dev@dpdk.org
Cc: Thomas Monjalon <thomas@monjalon.net>; Kadam, Pallavi <pallavi.kadam@intel.com>; Burakov, Anatoly <anatoly.burakov@intel.com>; Menon, Ranjit <ranjit.menon@intel.com>; Harini Ramakrishnan <Harini.Ramakrishnan@microsoft.com>
Subject: Re: [EXTERNAL] Windows Support Plan

You should talk to the Windows DPDK developers.
They have been presenting regularly at dpdk summits. Look up videos for more info.

The initial port is focused on running DPDK on bare metal with Intel NIC. Your version looks more aligned with Windows as guest in KVM.

Get Outlook for Android<https://aka.ms/ghei36>
________________________________
From: Dmitry Kozliuk <dmitry.kozliuk@gmail.com<mailto:dmitry.kozliuk@gmail.com>>
Sent: Sunday, February 2, 2020 9:37:36 PM
To: dev@dpdk.org<mailto:dev@dpdk.org> <dev@dpdk.org<mailto:dev@dpdk.org>>
Cc: Thomas Monjalon <thomas@monjalon.net<mailto:thomas@monjalon.net>>; Pallavi Kadam <pallavi.kadam@intel.com<mailto:pallavi.kadam@intel.com>>; Anatoly Burakov <anatoly.burakov@intel.com<mailto:anatoly.burakov@intel.com>>; Ranjit Menon <ranjit.menon@intel.com<mailto:ranjit.menon@intel.com>>; Harini Ramakrishnan <Harini.Ramakrishnan@microsoft.com<mailto:Harini.Ramakrishnan@microsoft.com>>; Stephen Hemminger <sthemmin@microsoft.com<mailto:sthemmin@microsoft.com>>
Subject: [EXTERNAL] Windows Support Plan

Hi everyone!

Where do I find a high-level plan of comprehensive Windows support: design
decisions, implementation order, etc?

Information on the subject is very scarce, one may think it is abandoned.
Googling for "site:dpdk.org/ml/archives/dev/ windows" yields only two pages
of disjoint messages. I learned about "netuio" days ago from a tiny remark in
a "Minutes of Technical Board Meetings" email, and even then it took
enumerating "dpdk-next-windows" branches to find the source.

The matter is, as a New Year's holiday project of mine I implemented Windows
support from scratch to the point it runs in QEMU with virtio-pci [0]. It is
not of production quality, cuts some corners and lacks major features (see
bottom). My primary goal was fun^W making it work. Comparing it to
"windpdk-v18.08" branch of "dpdk-next-windows", I can see that 1) our
implementations take rather different approaches in some cases, and 2) both
have severe issues and would benefit from amalgamation. I'd like to
contribute to Windows support with this code, but to do so, coordination is
required, because changes are significant.


Primary topics to discuss:

1. Memory management (@Anatoly)

   1.1. MM changed radically since v18.08 and dpdk-next-windows does not
        implement it properly anyway, it allocates segment lists in a PCI bus
        driver. My implementation closely follows the Linux one using
        VirtualAlloc2() with XXX_PLACEHOLDER flags to reserve and commit
        memory, but does not map hugepages to files. Is there
        a consensus on MM approach in Windows?

        Anyway, I think EAL private MM API would have to be changed,
        because memory reservation, allocation, and mapping are
        completely different operations. Hiding this with an mmap() shim
        doesn't look right, because mmap()'s behavior differs even among
        Unix platforms.

   1.2. In Windows, there is no /dev/mem to implement rte_virt2iova(),
        so a simple kernel driver is required for mapping. Moreover,
        Windows kernel abstracts IOMMU, so those physical addresses may
        be unsuitable for DMA at all (see below).

2. Kernel drivers (@Harini, @Stephen)

   2.1. The most serious issue is that Windows formally prohibits using
        arbitrary physical addresses with DMA in favor of allocating
        special buffers (presumably because IOMMU may be engaged, and
        there is no way to check). We can either live with it
        (technically, everything works with PA mode), or we could revive
        DMA allocation API from ethdev to ask the driver for a proper
        DMA buffer.

   2.2. Neither netuio, nor my driver (userpci) support interrupts.
        I see not inherent difficulty here, but interface should be
        designed carefully.

   2.3. Windows allows mapping I/O ports into user-space, but there is
        no API to change IOPL, which makes mapping useless and requires
        a syscall for every I/O port access. This demolishes
        virtio-legacy performance. Perhaps Microsoft could give some
        advice here. OTOH, PIO is all legacy, so might be much effort is
        not justified.

   2.4. I believe GUIDs approach for identifying compatible devices
        should be strictly preferred, and not DosDevices symlinks. Think
        of Mellanox OFED on Linux, which uses a different driver, but
        could provide a compatible interface. Another reason is that
        a single driver can implement multiple kernel interfaces with
        appropriate GUIDs.

   2.5. DPDK Windows driver guidelines, driver review, and certification.
        The quality of both netuio and userpci is below standards now
        (e. g. netuio does not mind its context when mapping memory,
        and userpci lacks synchronization), code style is a mix of
        Windows and DPDK, logging may be insufficient.

3. POSIX shim vs EAL wrappers (@Thomas, @Pallavi, @Ranjit)

   What is the policy: to implement a POSIX shim in EAL (as the latest
   patches from Pallavi Kadam do), or to add dependencies (as [1] suggests)?
   IMO creating a shim is wrong. First, some POSIX concepts do not
   easily map to Windows, like poll() interface and I/O model in
   general. Second, there are numerous getopt, pthread, etc.
   implementations for Windows, no point wasting resources and repeat
   them, adding bugs. I can think of two exceptions:

   * <sys/queue.h>, which is header-only.

   * Berkeley sockets. Adding <winsock2.h> to public headers creates
     more trouble that its worth: definitions for a few structures and
     constants. May be there should be some <rte_socket.h> to abstract
     platform differences.


Some highlights on my implementation:

* Major features NOT supported:

  * multi-process (due to limited time)
  * interrupts (limited time + explained above)
  * eventdev (requires access to physical memory)
  * hot-plug (due to limited time and Windows knowledge)
  * bbdev (see comments in config/common_windows)
  * FreeBSD (trivial, I just don't use it)

* DPDK is built using MinGW-w64 with GNUmake or Meson.
  Drivers are built using DDK (msbuild or Visual Studio).
  Actually, I cross-compile DPDK and build drivers natively.

* Only tested on Windows 10 in QEMU with virtio-legacy.

* No docs, but there's nothing unusual for those familiar with Windows.
  Bind virt2phys driver to Root\virt2phys, bind userpci driver to device(s).

* Commit history is squashed, because it was a mess from experiments.
  There also may be some leftover changes, but those commits are not proper
  patches anyway.


References:

[0]: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FPlushBeaver%2Fdpdk%2Fcommits%2Fwindows&amp;data=02%7C01%7Csthemmin%40microsoft.com%7C485559de220c43a1fe2408d7a81fd5e9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637162727454625299&amp;sdata=W%2BrqF4EWaBmwEOb7t3fRrKfmu7GkHpIyNJ2us6Dx6QU%3D&amp;reserved=0
[1]: https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2015-February%2F014245.html&amp;data=02%7C01%7Csthemmin%40microsoft.com%7C485559de220c43a1fe2408d7a81fd5e9%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637162727454625299&amp;sdata=Hb%2FCD99bjzhDlfrcbKdBN%2FlFkqQyN3F%2BvYlPl1VIz8w%3D&amp;reserved=0

--
Dmitry Kozlyuk

  reply	other threads:[~2020-02-03 18:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-02 20:37 [dpdk-dev] " Dmitry Kozliuk
2020-02-03  9:15 ` [dpdk-dev] [EXTERNAL] " Stephen Hemminger
2020-02-03 18:18   ` Menon, Ranjit [this message]
2020-02-03 22:13     ` Dmitry Kozlyuk
2020-02-03 10:25 ` [dpdk-dev] " Burakov, Anatoly
2020-02-08 20:09   ` Dmitry Kozlyuk
2020-02-11 10:05     ` Burakov, Anatoly
2020-02-05  1:03 ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7603DC8746F9FC4D82EF0929C467267A737C5A1E@ORSMSX112.amr.corp.intel.com \
    --to=ranjit.menon@intel.com \
    --cc=Harini.Ramakrishnan@microsoft.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=pallavi.kadam@intel.com \
    --cc=sthemmin@microsoft.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).