DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Zhou, Danny" <danny.zhou@intel.com>
To: "Walukiewicz, Miroslaw" <Miroslaw.Walukiewicz@intel.com>,
	"Richardson, Bruce" <bruce.richardson@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
Date: Wed, 26 Nov 2014 12:22:47 +0000	[thread overview]
Message-ID: <DFDF335405C17848924A094BC35766CF0A9CC895@SHSMSX104.ccr.corp.intel.com> (raw)
In-Reply-To: <7C4248CAE043B144B1CD242D275626532FE0EC8B@IRSMSX104.ger.corp.intel.com>


> -----Original Message-----
> From: Walukiewicz, Miroslaw
> Sent: Wednesday, November 26, 2014 6:45 PM
> To: Zhou, Danny; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
> 
> Thank you for explanation.
> 
> I have a few  questions regarding the setup flow yet:
> 
> 1. Why we need this step:
> >   3. Setup a flow director rule to distribute packets with source ip
> > > > > > >      0.0.0.0 to rxq No.0
> > > > > > >   > ethtool -N eth0  flow-type udp4 src-ip 0.0.0.0 action 0
> 
DZ: By default, ixgbe kernel driver uses 32 (0-31) rx/tx queue pairs. Above example setup a filter
to route a UDP flow with src_ip 0.0.0.0 to queue No.0 which is used by kernel driver' rx/tx routine.

> 
> 2. You presented the filter setup for receiving all udp4 packets on specific queue
> > > > > > >   5. Setup a flow director rule to distribute packets with source ip
> > > > > > >      1.1.1.1 to rxq No.32. This needs to be done after testpmd starts.
> > > > > > >   > ethtool -N eth0 flow-type udp4 src-ip 1.1.1.1 action 32
> 
> How to configure flow director to receive all packets with dst-ip = 1.1.1.1 on qpair=32?
DZ: You can certainly do it using ethtool command-line like "ethtool -N eth0 flow-type udp4 dst-ip 1.1.1.1 action 32" to do it.

> Will TCP SYN packets caught by such filter setup?
DZ: Unfortunately, unlike DPDK that provides ixgbe_add_syn_filter() API to allows program SYN Packet Queue Filter register, the 
in_kernel ixgbe kernel driver does not touch that register. While I had seen ixgbe 3.18.7 driver hard-code a value in that register.
For all cases, there is no easy way to use ixgbe bifurcated driver to config it. Under bifurcated mode, DPDK cannot access that register.

> 3.  Do we have a possibility to setup a rule like:
> Forward all TCPv4 rx packets with dst-ip =1.1.1.1 and TCP port 2222 to qpair=32 including SYN packets?
DZ: Yes, ethtool and flow director supports that. Will send you a separated email regarding ethtool usage regarding flow director configuration.

> 3. In your application example you present that qpair number (32) is known before start of application
> > > > > > >   > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 \
> > > > > > >   >  --vdev=rte_bifurc,iface=eth0,qpairs=1 -- \
> > > > > > >   >  -i --rxfreet=32 --txfreet=32 --txrst=32
> 
> Is there a possibility to dynamic queue allocation? I ask about API.
>  I mean dynamic attaching and detaching queue from application level and not specifying the numbers in the command line.
> 
DZ: The example is just for experiment. When DPDK request queue pairs from ixgbe bifurcated driver, it only specify number of qpairs, the kernel
driver actually returns the absolute qpair index of assigned qpairs to application. Application can hence use it to invoke ethtool command-line to do it or
directly invoke IOCTL to bifurcated driver to setup FD.

> 4. Is there a possibility to create a rule with perfect match and directing the packets to the specific queue.
> I mean here a rule like:
> Forward all TCPv4 rx packets with dst-ip=1.1.1.1 src-ip=2.2.2.2 dst-port=2222 src-port=1234 to queue 33
> 
DZ: Yes, of course you can.

> Regards,
> 
> Mirek
> 
> > -----Original Message-----
> > From: Zhou, Danny
> > Sent: Tuesday, November 25, 2014 4:23 PM
> > To: Richardson, Bruce; Walukiewicz, Miroslaw
> > Cc: dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
> >
> >
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > > Sent: Tuesday, November 25, 2014 11:03 PM
> > > To: Walukiewicz, Miroslaw
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
> > >
> > > On Tue, Nov 25, 2014 at 02:57:13PM +0000, Walukiewicz, Miroslaw wrote:
> > > > Thank you Bruce for explanation of the idea.
> > >
> > > Actually, credit goes to Steve Liang, not me, for the explanation. :-)
> > >
> > > >
> > > > I have question regarding TCP SYN packets? Do you have any idea how to
> > share the TCP SYN requests between kernel and
> > > user-space application?
> > >
> > > As I'm giving the credit to Steve, I'll also pass the buck for answering that
> > > question to him too! :-)
> > >
> > > /Bruce
> >
> > On ixgbe' Rx queuing flow, match SYN filter stage is prior to Flow Director
> > filter stage. When working at bifurcated driver support mode,
> > DPDK cannot access those NIC registers except for the ones that are used to
> > rx/tx packets for assigned rx/tx queue pairs. So basically it really
> > depends on user to use ethtool or other interface to setup SYN filter via
> > ixgbe bifurcated driver. User can distribute TCP SYN packets to
> > kernel bifurcated driver owned rx queues or DPDK owned rx queues, for the
> > latter case, DPDK can still push them back to kernel via KNI if DPDK
> > does not want to use them. If you have a user space TCP/IP stacks on top of
> > DPDK, you can push them to the upper level stack rather instead.
> >
> > > >
> > > > Regards,
> > > >
> > > > Mirek
> > > >
> > > > > -----Original Message-----
> > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce
> > Richardson
> > > > > Sent: Tuesday, November 25, 2014 3:30 PM
> > > > > To: Neil Horman
> > > > > Cc: dev@dpdk.org
> > > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated
> > driver
> > > > >
> > > > > On Tue, Nov 25, 2014 at 09:23:16AM -0500, Neil Horman wrote:
> > > > > > On Tue, Nov 25, 2014 at 10:11:16PM +0800, Cunming Liang wrote:
> > > > > > >
> > > > > > > This is a RFC patch set to support "bifurcated driver" in DPDK.
> > > > > > >
> > > > > > >
> > > > > > > What is "bifurcated driver"?
> > > > > > > ===========================
> > > > > > >
> > > > > > > The "bifurcated driver" stands for the kernel NIC driver that
> > supports:
> > > > > > >
> > > > > > > 1. on-demand rx/tx queue pairs split-off and assignment to user
> > space
> > > > > > >
> > > > > > > 2. direct NIC resource(e.g. rx/tx queue registers) access from user
> > space
> > > > > > >
> > > > > > > 3. distributing packets to kernel or user space rx queues by
> > > > > > >    NIC's flow director according to the filter rules
> > > > > > >
> > > > > > > Here's the kernel patch set to support.
> > > > > > > http://comments.gmane.org/gmane.linux.network/333615
> > > > > > >
> > > > > > >
> > > > > > > Usage scenario
> > > > > > > =================
> > > > > > >
> > > > > > > It's well accepted by industry to use DPDK to process fast path
> > packets in
> > > > > > > user space in a high performance fashion, meanwhile processing
> > slow
> > > > > path
> > > > > > > control packets in kernel space is still needed as those packets
> > usually
> > > > > > > rely on in_kernel TCP/IP stacks and/or socket programming
> > interface.
> > > > > > >
> > > > > > > KNI(Kernel NIC Interface) mechanism in DPDK is designed to meet
> > this
> > > > > > > requirement, with below limitation:
> > > > > > >
> > > > > > >   1) Software classifies packets and distributes them to kernel via
> > DPDK
> > > > > > >      software rings, at the cost of significant CPU cycles and memory
> > > > > bandwidth.
> > > > > > >
> > > > > > >   2) Memory copy packets between kernel' socket buffer and mbuf
> > brings
> > > > > > >      significant negative performance impact to KNI performance.
> > > > > > >
> > > > > > > The bifurcated driver provides a alternative approach that not only
> > > > > offloads
> > > > > > > flow classification and distribution to NIC but also support packets
> > > > > zero_copy.
> > > > > > >
> > > > > > > User can use standard ethtool to add filter rules to the NIC in order
> > to
> > > > > > > distribute specific flows to the queues only accessed by kernel
> > driver and
> > > > > > > stack, and add other rules to distribute packets to the queues
> > assigned to
> > > > > > > user-space.
> > > > > > >
> > > > > > > For those rx/tx queue pairs that directly accessed from user space,
> > > > > > > DPDK takes over the packets rx/tx as well as corresponding DMA
> > > > > operation
> > > > > > > for high performance packet I/O.
> > > > > > >
> > > > > > >
> > > > > > > What's the impact and change to DPDK
> > > > > > > ======================================
> > > > > > >
> > > > > > > DPDK usually binds PCIe NIC devices by leveraging kernel' user
> > space
> > > > > driver
> > > > > > > mechanism UIO or VFIO to map entire NIC' PCIe I/O space of NIC to
> > user
> > > > > space.
> > > > > > > The bifurcated driver PMD talks to a NIC interface using raw socket
> > APIs
> > > > > and
> > > > > > > only mmap() limited I/O space (e.g. certain 4K pages) for accessing
> > > > > involved
> > > > > > > rx/tx queue pairs. So the impact and changes mainly comes with
> > below:
> > > > > > >
> > > > > > > - netdev
> > > > > > >     DPDK needs to create a af_packet socket and bind it to a
> > bifurcated
> > > > > netdev.
> > > > > > >     The socket fd will be used to request 'queue pairs info',
> > > > > > >     'split/return queue pairs' and etc. The PCIe device ID, netdev MAC
> > > > > address,
> > > > > > >     numa info are also from the netdev response.
> > > > > > >
> > > > > > > - PCIe device scan and driver probe
> > > > > > >     netdev provides the PCIe device ID information. Refer to the
> > device ID,
> > > > > > >     the correct driver should be used. And for such netdev device,
> > the
> > > > > creation
> > > > > > >     of PCIe device is no longer from scan but the on-demand
> > assignment.
> > > > > > >
> > > > > > > - PCIe BAR mapping
> > > > > > >     "bifurcated driver" maps several pages for the queue pairs.
> > > > > > >     Others BAR register space maps to a fake page. The BAR mapping
> > go
> > > > > through
> > > > > > >     mmap on sockfd. Which is a little different from what UIO/VFIO
> > does.
> > > > > > >
> > > > > > > - PMD
> > > > > > >     The PMD will no longer really initialize and configure NIC.
> > > > > > >     Instead, it only takes care the queue pair setup, rx_burst and
> > tx_burst.
> > > > > > >
> > > > > > > The patch uses eal '--vdev' parameter to assign netdev iface name
> > and
> > > > > number of
> > > > > > > queue pairs. Here's a example about how to configure the
> > bifurcated
> > > > > driver and
> > > > > > > run DPDK testpmd with bifurcated PMD.
> > > > > > >
> > > > > > >   1. Set promisc mode
> > > > > > >   > ifconfig eth0 promisc
> > > > > > >
> > > > > > >   2. Turn on fdir
> > > > > > >   > ethtool -K eth0 ntuple on
> > > > > > >
> > > > > > >   3. Setup a flow director rule to distribute packets with source ip
> > > > > > >      0.0.0.0 to rxq No.0
> > > > > > >   > ethtool -N eth0  flow-type udp4 src-ip 0.0.0.0 action 0
> > > > > > >
> > > > > > >   4. Run testpmd on netdev 'eth0' with 1 queue pair.
> > > > > > >   > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 \
> > > > > > >   >  --vdev=rte_bifurc,iface=eth0,qpairs=1 -- \
> > > > > > >   >  -i --rxfreet=32 --txfreet=32 --txrst=32
> > > > > > >   Note:
> > > > > > >     iface and qpairs arguments above specify the netdev interface
> > name
> > > > > and
> > > > > > >     number of qpairs that user space request from the "bifurcated
> > driver"
> > > > > > >     respectively.
> > > > > > >
> > > > > > >   5. Setup a flow director rule to distribute packets with source ip
> > > > > > >      1.1.1.1 to rxq No.32. This needs to be done after testpmd starts.
> > > > > > >   > ethtool -N eth0 flow-type udp4 src-ip 1.1.1.1 action 32
> > > > > > >
> > > > > > > Below illustrates the detailed changes in this patch set.
> > > > > > >
> > > > > > > eal
> > > > > > > --------
> > > > > > > The first two patches are all about the eal API declaration and Linux
> > > > > version
> > > > > > > definition to support af_packet socket and verbs of bifurcated
> > netdev.
> > > > > > > Those APIs include the verbs like open, bind, (un)map, split/retturn,
> > > > > map_umem.
> > > > > > > And other APIs like set_pci, get_ifinfo and get/put_devargs which
> > help to
> > > > > > > generate pci device from bifurcated netdev and get basic netdev
> > info.
> > > > > > >
> > > > > > > The third patch is used to allow probing driver on the PCIe VDEV
> > created
> > > > > from
> > > > > > > a NIC interface driven by "bifurcated driver". It defines a new flag
> > > > > > > 'RTE_PCI_DRV_BIFURC' used for direct ring access PMD.
> > > > > > >
> > > > > > > librte_bifurc
> > > > > > > ---------------
> > > > > > > The library is used as a VDEV bus driver to scan '--vdev=rte_bifurc'
> > VDEV
> > > > > > > from eal command-line. It generates the PCIe VDEV device ready
> > for
> > > > > further
> > > > > > > driver probe. It maintains the bifurcated device information include
> > > > > sockfd,
> > > > > > > hwaddr, mtu, qpairs, iface_name. It's used for other direct ring
> > access
> > > > > PMD
> > > > > > > to apply for bifurcated device info.
> > > > > > >
> > > > > > > direct ring access PMD
> > > > > > > -------------------------
> > > > > > > The patch provides direct ring access PMD for ixgbe. Comparing to
> > the
> > > > > normal
> > > > > > > PMD ixgbe, it uses 'RTE_PCI_DRV_BIFURC' flag during self
> > registration.
> > > > > > > It mostly reuses the existing PMD ops to avoid re-implementing
> > > > > everything
> > > > > > > from scratch. And it also modifies the rx/tx_queue_setup to allow
> > queue
> > > > > > > setup from any queue offset.
> > > > > > >
> > > > > > > Supported NIC driver
> > > > > > > ========================
> > > > > > >
> > > > > > > The "bifurcated driver" kernel patch only supports "ixgbe" driver at
> > the
> > > > > moment,
> > > > > > > so this RFC patch also provides "ixgbe" PMD via direct-mapped rings
> > as
> > > > > sample.
> > > > > > > The support for 40GE(i40e) will be added in the future.
> > > > > > >
> > > > > > > In addition, for those multi-queues enabled NIC with flow director
> > > > > capability
> > > > > > > to do perform packet classification and distribution, there's no
> > special
> > > > > > > technical gap to provide bifurcated driver approach support.
> > > > > > >
> > > > > > > Limitation
> > > > > > > ============
> > > > > > >
> > > > > > > By using "bifurcated driver", user space only takes over the DMA
> > > > > operation.
> > > > > > > For those NIC configure setting, it's out of control from user space
> > PMD.
> > > > > > > All the NIC setting including add/del filter rules need to be done by
> > > > > > > standard Linux network tools(e.g. ethtool).
> > > > > > > So the feature support really depend on how much are supported
> > by
> > > > > ethtool.
> > > > > > >
> > > > > > >
> > > > > > > Any questions, comments and feedback are welcome.
> > > > > > >
> > > > > > >
> > > > > > > -END-
> > > > > > >
> > > > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > > > Signed-off-by: Danny Zhou <danny.zhou@intel.com>
> > > > > > >
> > > > > > > *** BLURB HERE ***
> > > > > > >
> > > > > > > Cunming Liang (6):
> > > > > > >   eal: common direct ring access API
> > > > > > >   eal: direct ring access support by linux af_packet
> > > > > > >   pci: allow VDEV as pci device during device driver probe
> > > > > > >   bifurc: add driver to scan bifurcated netdev
> > > > > > >   ixgbe: rx/tx queue stop bug fix
> > > > > > >   ixgbe: PMD for bifurc ixgbe net device
> > > > > > >
> > > > > > >  config/common_linuxapp                         |   5 +
> > > > > > >  lib/Makefile                                   |   1 +
> > > > > > >  lib/librte_bifurc/Makefile                     |  58 +++++
> > > > > > >  lib/librte_bifurc/rte_bifurc.c                 | 284
> > +++++++++++++++++++++
> > > > > > >  lib/librte_bifurc/rte_bifurc.h                 |  90 +++++++
> > > > > > >  lib/librte_eal/common/Makefile                 |   5 +
> > > > > > >  lib/librte_eal/common/include/rte_pci.h        |   4 +
> > > > > > >  lib/librte_eal/common/include/rte_pci_bifurc.h | 186
> > ++++++++++++++
> > > > > > >  lib/librte_eal/linuxapp/eal/Makefile           |   1 +
> > > > > > >  lib/librte_eal/linuxapp/eal/eal_pci.c          |  42 ++--
> > > > > > >  lib/librte_eal/linuxapp/eal/eal_pci_bifurc.c   | 336
> > > > > +++++++++++++++++++++++++
> > > > > > >  lib/librte_ether/rte_ethdev.c                  |   3 +-
> > > > > > >  lib/librte_pmd_ixgbe/Makefile                  |  13 +-
> > > > > > >  lib/librte_pmd_ixgbe/ixgbe_bifurcate.c         | 303
> > > > > ++++++++++++++++++++++
> > > > > > >  lib/librte_pmd_ixgbe/ixgbe_bifurcate.h         |  57 +++++
> > > > > > >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c              |  44 +++-
> > > > > > >  lib/librte_pmd_ixgbe/ixgbe_rxtx.h              |  10 +
> > > > > > >  mk/rte.app.mk                                  |   6 +
> > > > > > >  18 files changed, 1421 insertions(+), 27 deletions(-)
> > > > > > >  create mode 100644 lib/librte_bifurc/Makefile
> > > > > > >  create mode 100644 lib/librte_bifurc/rte_bifurc.c
> > > > > > >  create mode 100644 lib/librte_bifurc/rte_bifurc.h
> > > > > > >  create mode 100644
> > lib/librte_eal/common/include/rte_pci_bifurc.h
> > > > > > >  create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_bifurc.c
> > > > > > >  create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.c
> > > > > > >  create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.h
> > > > > > >
> > > > > > > --
> > > > > > > 1.8.1.4
> > > > > > >
> > > > > > >
> > > > > > AIUI, the bifurcated driver hasn't yet been accepted upstream, has it?
> > > > > Given
> > > > > > that, I don't think its wise to pull this in yet ahead of the kernel work,
> > as
> > > > > > there may still be kernel side changes that the user space pmd will
> > have to
> > > > > > adapt to.
> > > > > > Neil
> > > > > >
> > > > > Hence the RFC nature of the patch, I believe. :-) Before the kernel part
> > hits
> > > > > the
> > > > > main kernel tree we can at least discuss the overall direction to be
> > taken for
> > > > > this driver because it's significantly different that any other HW driver.
> > > > >
> > > > > /Bruce

  reply	other threads:[~2014-11-26 12:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-25 14:11 Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 1/6] eal: common direct ring access API Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 2/6] eal: direct ring access support by linux af_packet Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 3/6] pci: allow VDEV as pci device during device driver probe Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 4/6] bifurc: add driver to scan bifurcated netdev Cunming Liang
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 5/6] ixgbe: rx/tx queue stop bug fix Cunming Liang
2014-11-26  0:44   ` Ouyang, Changchun
2014-11-25 14:11 ` [dpdk-dev] [RFC PATCH 6/6] ixgbe: PMD for bifurc ixgbe net device Cunming Liang
2014-11-25 14:34   ` Bruce Richardson
2014-11-25 14:48     ` Liang, Cunming
2014-11-25 15:01       ` Bruce Richardson
2014-11-26  8:22         ` Liang, Cunming
2014-11-26 10:35           ` Bruce Richardson
2014-11-25 14:23 ` [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver Neil Horman
2014-11-25 14:29   ` Bruce Richardson
2014-11-25 14:40     ` Liang, Cunming
2014-11-25 14:46       ` Zhou, Danny
2014-11-25 14:57     ` Walukiewicz, Miroslaw
2014-11-25 15:02       ` Bruce Richardson
2014-11-25 15:23         ` Zhou, Danny
2014-11-26 10:45           ` Walukiewicz, Miroslaw
2014-11-26 12:22             ` Zhou, Danny [this message]
2015-04-09  3:43 ` 贾学涛
2015-04-20  9:53   ` Shelton Chia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DFDF335405C17848924A094BC35766CF0A9CC895@SHSMSX104.ccr.corp.intel.com \
    --to=danny.zhou@intel.com \
    --cc=Miroslaw.Walukiewicz@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).