From: "Zhou, Danny" <danny.zhou@intel.com>
To: Alex Markuze <alex@weka.io>, Thomas Monjalon <thomas.monjalon@6wind.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
"Fastabend, John R" <john.r.fastabend@intel.com>
Subject: Re: [dpdk-dev] bifurcated driver
Date: Wed, 5 Nov 2014 22:19:32 +0000 [thread overview]
Message-ID: <DFDF335405C17848924A094BC35766CF0A98FB4D@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <CAKfHP0XLcRhYHNfvPGM=stxqNGyqtLBEyME1cBqiZbPPGa53Bw@mail.gmail.com>
From: Alex Markuze [mailto:alex@weka.io]
Sent: Wednesday, November 05, 2014 11:19 PM
To: Thomas Monjalon
Cc: Zhou, Danny; dev@dpdk.org; Fastabend, John R
Subject: Re: [dpdk-dev] bifurcated driver
On Wed, Nov 5, 2014 at 5:14 PM, Alex Markuze <alex@weka.io<mailto:alex@weka.io>> wrote:
On Wed, Nov 5, 2014 at 3:00 PM, Thomas Monjalon <thomas.monjalon@6wind.com<mailto:thomas.monjalon@6wind.com>> wrote:
Hi Danny,
2014-10-31 17:36, O'driscoll, Tim:
> Bifurcated Driver (Danny.Zhou@intel.com<mailto:Danny.Zhou@intel.com>)
Thanks for the presentation of bifurcated driver during the community call.
I asked if you looked at ibverbs and you wanted a link to check.
The kernel module is here:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
The userspace library:
http://git.kernel.org/cgit/libs/infiniband/libibverbs.git
Extract from Kconfig:
"
config INFINIBAND_USER_ACCESS
tristate "InfiniBand userspace access (verbs and CM)"
select ANON_INODES
---help---
Userspace InfiniBand access support. This enables the
kernel side of userspace verbs and the userspace
communication manager (CM). This allows userspace processes
to set up connections and directly access InfiniBand
hardware for fast-path operations. You will also need
libibverbs, libibcm and a hardware driver library from
<http://www.openfabrics.org/git/>.
"
It seems to be close to the bifurcated driver needs.
Not sure if it can solve the security issues if there is no dedicated MMU
in the NIC.
Mellanox NIC's and other RDMA HW (Infiniband/RoCE/iWARP) have MTT units - memory translation units - a dedicated MMU. These are filled via an ibv_reg_mr sys calls - this creates a Process VM to physical/iova memory mapping in the NIC. Thus each process can access only its own memory via the NIC. This is the way RNIC*s resolve the security issue I'm not sure how standard intel nics could support this scheme.
DZ: Intel NICs does not provide such a embedded memory translation unit, but Intel chipset supports IOMMU with a generic memory protection mechanism to provide physical/iova memory mapping for DMA transactions on any PCIe device, rather than NIC only.
There is already a 6wind PMD for mellanox Nics. I'm assuming this PMD is verbs based and behaves similar to the bifurcated driver proposed.
http://www.mellanox.com/page/press_release_item?id=979
DZ: is it open sourced for community to use? I guess answer is No. Also, that PMD should have ported majority of Mellanox kernel driver code to DPDK as lots of NIC control related code needed, while the bifurcated driver approach only needs to support minimum Mellanox NIC specific packet rx/tx routines to achieve the DPDK claimed high performance by using all DPDK performance optimization techniques, such as huge page, fixed-size packet buffer, zero-copy, PMD, etc. Kernel driver still remains NIC control, without porting it to DPDK.
One, thing that I don't understand (And will be happy if some one could shed some light on), is how does the NIC supposed do distinguish between packets that need to go to the kernel driver rings and packets going to user space rings.
DZ: it depends on user. User should use standard ethtool (see below examples) to enable flow director and distribute packets to kernel or user space owned rx queue, by specifying 5-tuple as well as destination rxq index. Flow director embedded in NIC does flow classification and distribution, rather than the software approach like DPDK KNI. If you argue SRIOV has similar rx/tx queue pair partition capability, I would say bifurcated driver approach provides much more flexibility than SRIOV, (e.g, variable number of qpairs allocation for user space, L3 5-tuple based flow classification and distribution rather than SRIOV’ L2 classification based on MAC or VLAN)
ethtool -K ethX ntuple on # enable flow director
ethtool -N ethX flow-type udp4 src-ip 0.0.0.0 action 0 # distribute udp packet wit source IP 0.0.0.0 to rx queue No.0
I feel we should sum up pros and cons of
- igb_uio
- uio_pci_generic
- VFIO
- ibverbs
- bifurcated driver
I suggest to consider these criterias:
- upstream status
- usable with kernel netdev
- usable in a vm
- usable for ethernet
- hardware requirements
- security protection
- performance
Regarding IBVERBS - I'm not sure how its relevant to future DPDK development , but this is the run down as I know It.
This is a veteran package called OFED , or its counterpart Mellanox OFED.
---- The kernel drivers are upstream
---- The PCI dev stays in the kernels care trough out its life span
---- SRIOV support exists, paravirt support exists only(AFAIK) as an Office of the CTO(VMware) project called vRDMA
---- Eth/RoCE (RDMA over Converged Ethernet)/IB
=== HW === RDMA capable HW ONLY.
---- Security is designed into RDMA HW
---- Stellar performance - Favored by HPC.
*RNIC - RDMA (Remote DMA - iWARP/Infinibad/RoCE)capable NICs.
--
Thomas
next prev parent reply other threads:[~2014-11-05 22:10 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-24 9:22 [dpdk-dev] DPDK Community Conference Call - Friday 31st October O'driscoll, Tim
2014-10-24 15:05 ` Michael Marchetti
2014-10-24 15:22 ` O'driscoll, Tim
2014-10-31 15:34 ` O'driscoll, Tim
2014-10-31 17:36 ` O'driscoll, Tim
2014-11-01 12:59 ` Neil Horman
2014-11-01 14:05 ` Vincent JARDIN
2014-11-05 13:00 ` [dpdk-dev] bifurcated driver Thomas Monjalon
2014-11-05 15:14 ` Alex Markuze
2014-11-05 15:19 ` Alex Markuze
2014-11-05 22:19 ` Zhou, Danny [this message]
2014-11-05 22:48 ` Zhou, Danny
2014-11-06 1:30 ` Vincent JARDIN
2014-11-06 4:45 ` Zhou, Danny
2014-11-06 8:13 ` Alex Markuze
2014-11-06 9:10 ` Nicolas Dichtel
2014-11-24 11:57 ` Luke Gorrie
2014-11-24 13:38 ` Zhou, Danny
2014-11-20 7:17 ` [dpdk-dev] DPDK Community Conference Call - Friday 31st October Kevin Wilson
2014-11-20 13:13 ` O'driscoll, Tim
2014-11-20 17:02 ` Kevin Wilson
2014-11-20 23:26 ` O'driscoll, Tim
2014-11-21 10:54 ` Kevin Wilson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DFDF335405C17848924A094BC35766CF0A98FB4D@SHSMSX104.ccr.corp.intel.com \
--to=danny.zhou@intel.com \
--cc=alex@weka.io \
--cc=dev@dpdk.org \
--cc=john.r.fastabend@intel.com \
--cc=thomas.monjalon@6wind.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).