From: Bruce Richardson <bruce.richardson@intel.com>
To: "Walukiewicz, Miroslaw" <Miroslaw.Walukiewicz@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
Date: Tue, 25 Nov 2014 15:02:58 +0000 [thread overview]
Message-ID: <20141125150257.GB6800@bricha3-MOBL3> (raw)
In-Reply-To: <7C4248CAE043B144B1CD242D275626532FE0E6E5@IRSMSX104.ger.corp.intel.com>
On Tue, Nov 25, 2014 at 02:57:13PM +0000, Walukiewicz, Miroslaw wrote:
> Thank you Bruce for explanation of the idea.
Actually, credit goes to Steve Liang, not me, for the explanation. :-)
>
> I have question regarding TCP SYN packets? Do you have any idea how to share the TCP SYN requests between kernel and user-space application?
As I'm giving the credit to Steve, I'll also pass the buck for answering that
question to him too! :-)
/Bruce
>
> Regards,
>
> Mirek
>
> > -----Original Message-----
> > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > Sent: Tuesday, November 25, 2014 3:30 PM
> > To: Neil Horman
> > Cc: dev@dpdk.org
> > Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
> >
> > On Tue, Nov 25, 2014 at 09:23:16AM -0500, Neil Horman wrote:
> > > On Tue, Nov 25, 2014 at 10:11:16PM +0800, Cunming Liang wrote:
> > > >
> > > > This is a RFC patch set to support "bifurcated driver" in DPDK.
> > > >
> > > >
> > > > What is "bifurcated driver"?
> > > > ===========================
> > > >
> > > > The "bifurcated driver" stands for the kernel NIC driver that supports:
> > > >
> > > > 1. on-demand rx/tx queue pairs split-off and assignment to user space
> > > >
> > > > 2. direct NIC resource(e.g. rx/tx queue registers) access from user space
> > > >
> > > > 3. distributing packets to kernel or user space rx queues by
> > > > NIC's flow director according to the filter rules
> > > >
> > > > Here's the kernel patch set to support.
> > > > http://comments.gmane.org/gmane.linux.network/333615
> > > >
> > > >
> > > > Usage scenario
> > > > =================
> > > >
> > > > It's well accepted by industry to use DPDK to process fast path packets in
> > > > user space in a high performance fashion, meanwhile processing slow
> > path
> > > > control packets in kernel space is still needed as those packets usually
> > > > rely on in_kernel TCP/IP stacks and/or socket programming interface.
> > > >
> > > > KNI(Kernel NIC Interface) mechanism in DPDK is designed to meet this
> > > > requirement, with below limitation:
> > > >
> > > > 1) Software classifies packets and distributes them to kernel via DPDK
> > > > software rings, at the cost of significant CPU cycles and memory
> > bandwidth.
> > > >
> > > > 2) Memory copy packets between kernel' socket buffer and mbuf brings
> > > > significant negative performance impact to KNI performance.
> > > >
> > > > The bifurcated driver provides a alternative approach that not only
> > offloads
> > > > flow classification and distribution to NIC but also support packets
> > zero_copy.
> > > >
> > > > User can use standard ethtool to add filter rules to the NIC in order to
> > > > distribute specific flows to the queues only accessed by kernel driver and
> > > > stack, and add other rules to distribute packets to the queues assigned to
> > > > user-space.
> > > >
> > > > For those rx/tx queue pairs that directly accessed from user space,
> > > > DPDK takes over the packets rx/tx as well as corresponding DMA
> > operation
> > > > for high performance packet I/O.
> > > >
> > > >
> > > > What's the impact and change to DPDK
> > > > ======================================
> > > >
> > > > DPDK usually binds PCIe NIC devices by leveraging kernel' user space
> > driver
> > > > mechanism UIO or VFIO to map entire NIC' PCIe I/O space of NIC to user
> > space.
> > > > The bifurcated driver PMD talks to a NIC interface using raw socket APIs
> > and
> > > > only mmap() limited I/O space (e.g. certain 4K pages) for accessing
> > involved
> > > > rx/tx queue pairs. So the impact and changes mainly comes with below:
> > > >
> > > > - netdev
> > > > DPDK needs to create a af_packet socket and bind it to a bifurcated
> > netdev.
> > > > The socket fd will be used to request 'queue pairs info',
> > > > 'split/return queue pairs' and etc. The PCIe device ID, netdev MAC
> > address,
> > > > numa info are also from the netdev response.
> > > >
> > > > - PCIe device scan and driver probe
> > > > netdev provides the PCIe device ID information. Refer to the device ID,
> > > > the correct driver should be used. And for such netdev device, the
> > creation
> > > > of PCIe device is no longer from scan but the on-demand assignment.
> > > >
> > > > - PCIe BAR mapping
> > > > "bifurcated driver" maps several pages for the queue pairs.
> > > > Others BAR register space maps to a fake page. The BAR mapping go
> > through
> > > > mmap on sockfd. Which is a little different from what UIO/VFIO does.
> > > >
> > > > - PMD
> > > > The PMD will no longer really initialize and configure NIC.
> > > > Instead, it only takes care the queue pair setup, rx_burst and tx_burst.
> > > >
> > > > The patch uses eal '--vdev' parameter to assign netdev iface name and
> > number of
> > > > queue pairs. Here's a example about how to configure the bifurcated
> > driver and
> > > > run DPDK testpmd with bifurcated PMD.
> > > >
> > > > 1. Set promisc mode
> > > > > ifconfig eth0 promisc
> > > >
> > > > 2. Turn on fdir
> > > > > ethtool -K eth0 ntuple on
> > > >
> > > > 3. Setup a flow director rule to distribute packets with source ip
> > > > 0.0.0.0 to rxq No.0
> > > > > ethtool -N eth0 flow-type udp4 src-ip 0.0.0.0 action 0
> > > >
> > > > 4. Run testpmd on netdev 'eth0' with 1 queue pair.
> > > > > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 \
> > > > > --vdev=rte_bifurc,iface=eth0,qpairs=1 -- \
> > > > > -i --rxfreet=32 --txfreet=32 --txrst=32
> > > > Note:
> > > > iface and qpairs arguments above specify the netdev interface name
> > and
> > > > number of qpairs that user space request from the "bifurcated driver"
> > > > respectively.
> > > >
> > > > 5. Setup a flow director rule to distribute packets with source ip
> > > > 1.1.1.1 to rxq No.32. This needs to be done after testpmd starts.
> > > > > ethtool -N eth0 flow-type udp4 src-ip 1.1.1.1 action 32
> > > >
> > > > Below illustrates the detailed changes in this patch set.
> > > >
> > > > eal
> > > > --------
> > > > The first two patches are all about the eal API declaration and Linux
> > version
> > > > definition to support af_packet socket and verbs of bifurcated netdev.
> > > > Those APIs include the verbs like open, bind, (un)map, split/retturn,
> > map_umem.
> > > > And other APIs like set_pci, get_ifinfo and get/put_devargs which help to
> > > > generate pci device from bifurcated netdev and get basic netdev info.
> > > >
> > > > The third patch is used to allow probing driver on the PCIe VDEV created
> > from
> > > > a NIC interface driven by "bifurcated driver". It defines a new flag
> > > > 'RTE_PCI_DRV_BIFURC' used for direct ring access PMD.
> > > >
> > > > librte_bifurc
> > > > ---------------
> > > > The library is used as a VDEV bus driver to scan '--vdev=rte_bifurc' VDEV
> > > > from eal command-line. It generates the PCIe VDEV device ready for
> > further
> > > > driver probe. It maintains the bifurcated device information include
> > sockfd,
> > > > hwaddr, mtu, qpairs, iface_name. It's used for other direct ring access
> > PMD
> > > > to apply for bifurcated device info.
> > > >
> > > > direct ring access PMD
> > > > -------------------------
> > > > The patch provides direct ring access PMD for ixgbe. Comparing to the
> > normal
> > > > PMD ixgbe, it uses 'RTE_PCI_DRV_BIFURC' flag during self registration.
> > > > It mostly reuses the existing PMD ops to avoid re-implementing
> > everything
> > > > from scratch. And it also modifies the rx/tx_queue_setup to allow queue
> > > > setup from any queue offset.
> > > >
> > > > Supported NIC driver
> > > > ========================
> > > >
> > > > The "bifurcated driver" kernel patch only supports "ixgbe" driver at the
> > moment,
> > > > so this RFC patch also provides "ixgbe" PMD via direct-mapped rings as
> > sample.
> > > > The support for 40GE(i40e) will be added in the future.
> > > >
> > > > In addition, for those multi-queues enabled NIC with flow director
> > capability
> > > > to do perform packet classification and distribution, there's no special
> > > > technical gap to provide bifurcated driver approach support.
> > > >
> > > > Limitation
> > > > ============
> > > >
> > > > By using "bifurcated driver", user space only takes over the DMA
> > operation.
> > > > For those NIC configure setting, it's out of control from user space PMD.
> > > > All the NIC setting including add/del filter rules need to be done by
> > > > standard Linux network tools(e.g. ethtool).
> > > > So the feature support really depend on how much are supported by
> > ethtool.
> > > >
> > > >
> > > > Any questions, comments and feedback are welcome.
> > > >
> > > >
> > > > -END-
> > > >
> > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > Signed-off-by: Danny Zhou <danny.zhou@intel.com>
> > > >
> > > > *** BLURB HERE ***
> > > >
> > > > Cunming Liang (6):
> > > > eal: common direct ring access API
> > > > eal: direct ring access support by linux af_packet
> > > > pci: allow VDEV as pci device during device driver probe
> > > > bifurc: add driver to scan bifurcated netdev
> > > > ixgbe: rx/tx queue stop bug fix
> > > > ixgbe: PMD for bifurc ixgbe net device
> > > >
> > > > config/common_linuxapp | 5 +
> > > > lib/Makefile | 1 +
> > > > lib/librte_bifurc/Makefile | 58 +++++
> > > > lib/librte_bifurc/rte_bifurc.c | 284 +++++++++++++++++++++
> > > > lib/librte_bifurc/rte_bifurc.h | 90 +++++++
> > > > lib/librte_eal/common/Makefile | 5 +
> > > > lib/librte_eal/common/include/rte_pci.h | 4 +
> > > > lib/librte_eal/common/include/rte_pci_bifurc.h | 186 ++++++++++++++
> > > > lib/librte_eal/linuxapp/eal/Makefile | 1 +
> > > > lib/librte_eal/linuxapp/eal/eal_pci.c | 42 ++--
> > > > lib/librte_eal/linuxapp/eal/eal_pci_bifurc.c | 336
> > +++++++++++++++++++++++++
> > > > lib/librte_ether/rte_ethdev.c | 3 +-
> > > > lib/librte_pmd_ixgbe/Makefile | 13 +-
> > > > lib/librte_pmd_ixgbe/ixgbe_bifurcate.c | 303
> > ++++++++++++++++++++++
> > > > lib/librte_pmd_ixgbe/ixgbe_bifurcate.h | 57 +++++
> > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 44 +++-
> > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 10 +
> > > > mk/rte.app.mk | 6 +
> > > > 18 files changed, 1421 insertions(+), 27 deletions(-)
> > > > create mode 100644 lib/librte_bifurc/Makefile
> > > > create mode 100644 lib/librte_bifurc/rte_bifurc.c
> > > > create mode 100644 lib/librte_bifurc/rte_bifurc.h
> > > > create mode 100644 lib/librte_eal/common/include/rte_pci_bifurc.h
> > > > create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_bifurc.c
> > > > create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.c
> > > > create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.h
> > > >
> > > > --
> > > > 1.8.1.4
> > > >
> > > >
> > > AIUI, the bifurcated driver hasn't yet been accepted upstream, has it?
> > Given
> > > that, I don't think its wise to pull this in yet ahead of the kernel work, as
> > > there may still be kernel side changes that the user space pmd will have to
> > > adapt to.
> > > Neil
> > >
> > Hence the RFC nature of the patch, I believe. :-) Before the kernel part hits
> > the
> > main kernel tree we can at least discuss the overall direction to be taken for
> > this driver because it's significantly different that any other HW driver.
> >
> > /Bruce
next prev parent reply other threads:[~2014-11-25 14:52 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-25 14:11 Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 1/6] eal: common direct ring access API Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 2/6] eal: direct ring access support by linux af_packet Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 3/6] pci: allow VDEV as pci device during device driver probe Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 4/6] bifurc: add driver to scan bifurcated netdev Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 5/6] ixgbe: rx/tx queue stop bug fix Cunming Liang
2014-11-26 0:44 ` Ouyang, Changchun
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 6/6] ixgbe: PMD for bifurc ixgbe net device Cunming Liang
2014-11-25 14:34 ` Bruce Richardson
2014-11-25 14:48 ` Liang, Cunming
2014-11-25 15:01 ` Bruce Richardson
2014-11-26 8:22 ` Liang, Cunming
2014-11-26 10:35 ` Bruce Richardson
2014-11-25 14:23 ` [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver Neil Horman
2014-11-25 14:29 ` Bruce Richardson
2014-11-25 14:40 ` Liang, Cunming
2014-11-25 14:46 ` Zhou, Danny
2014-11-25 14:57 ` Walukiewicz, Miroslaw
2014-11-25 15:02 ` Bruce Richardson [this message]
2014-11-25 15:23 ` Zhou, Danny
2014-11-26 10:45 ` Walukiewicz, Miroslaw
2014-11-26 12:22 ` Zhou, Danny
2015-04-09 3:43 ` 贾学涛
2015-04-20 9:53 ` Shelton Chia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141125150257.GB6800@bricha3-MOBL3 \
--to=bruce.richardson@intel.com \
--cc=Miroslaw.Walukiewicz@intel.com \
--cc=dev@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).