From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <danny.zhou@intel.com>
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
 by dpdk.org (Postfix) with ESMTP id E0E7C3975
 for <dev@dpdk.org>; Tue, 25 Nov 2014 16:12:56 +0100 (CET)
Received: from orsmga002.jf.intel.com ([10.7.209.21])
 by orsmga102.jf.intel.com with ESMTP; 25 Nov 2014 07:20:54 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.07,456,1413270000"; d="scan'208";a="643271491"
Received: from pgsmsx101.gar.corp.intel.com ([10.221.44.78])
 by orsmga002.jf.intel.com with ESMTP; 25 Nov 2014 07:23:18 -0800
Received: from shsmsx102.ccr.corp.intel.com (10.239.4.154) by
 PGSMSX101.gar.corp.intel.com (10.221.44.78) with Microsoft SMTP Server (TLS)
 id 14.3.195.1; Tue, 25 Nov 2014 23:23:16 +0800
Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.182]) by
 shsmsx102.ccr.corp.intel.com ([169.254.2.216]) with mapi id 14.03.0195.001;
 Tue, 25 Nov 2014 23:23:15 +0800
From: "Zhou, Danny" <danny.zhou@intel.com>
To: "Richardson, Bruce" <bruce.richardson@intel.com>, "Walukiewicz, Miroslaw"
 <Miroslaw.Walukiewicz@intel.com>
Thread-Topic: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
Thread-Index: AQHQCLnU5FCMVbS0qkK44I+J1nYJ1Zxw3yQAgAAB3oCAAAeegIAAAZwAgACI9VA=
Date: Tue, 25 Nov 2014 15:23:14 +0000
Message-ID: <DFDF335405C17848924A094BC35766CF0A9CAD17@SHSMSX104.ccr.corp.intel.com>
References: <1416924682-24170-1-git-send-email-cunming.liang@intel.com>
 <20141125142316.GD23352@hmsreliant.think-freely.org>
 <20141125142956.GA6672@bricha3-MOBL3>
 <7C4248CAE043B144B1CD242D275626532FE0E6E5@IRSMSX104.ger.corp.intel.com>
 <20141125150257.GB6800@bricha3-MOBL3>
In-Reply-To: <20141125150257.GB6800@bricha3-MOBL3>
Accept-Language: zh-CN, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Nov 2014 15:13:00 -0000



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> Sent: Tuesday, November 25, 2014 11:03 PM
> To: Walukiewicz, Miroslaw
> Cc: dev@dpdk.org
> Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver
>=20
> On Tue, Nov 25, 2014 at 02:57:13PM +0000, Walukiewicz, Miroslaw wrote:
> > Thank you Bruce for explanation of the idea.
>=20
> Actually, credit goes to Steve Liang, not me, for the explanation. :-)
>=20
> >
> > I have question regarding TCP SYN packets? Do you have any idea how to =
share the TCP SYN requests between kernel and
> user-space application?
>=20
> As I'm giving the credit to Steve, I'll also pass the buck for answering =
that
> question to him too! :-)
>=20
> /Bruce

On ixgbe' Rx queuing flow, match SYN filter stage is prior to Flow Director=
 filter stage. When working at bifurcated driver support mode,
DPDK cannot access those NIC registers except for the ones that are used to=
 rx/tx packets for assigned rx/tx queue pairs. So basically it really=20
depends on user to use ethtool or other interface to setup SYN filter via i=
xgbe bifurcated driver. User can distribute TCP SYN packets to
kernel bifurcated driver owned rx queues or DPDK owned rx queues, for the l=
atter case, DPDK can still push them back to kernel via KNI if DPDK
does not want to use them. If you have a user space TCP/IP stacks on top of=
 DPDK, you can push them to the upper level stack rather instead.=20

> >
> > Regards,
> >
> > Mirek
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
> > > Sent: Tuesday, November 25, 2014 3:30 PM
> > > To: Neil Horman
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated dr=
iver
> > >
> > > On Tue, Nov 25, 2014 at 09:23:16AM -0500, Neil Horman wrote:
> > > > On Tue, Nov 25, 2014 at 10:11:16PM +0800, Cunming Liang wrote:
> > > > >
> > > > > This is a RFC patch set to support "bifurcated driver" in DPDK.
> > > > >
> > > > >
> > > > > What is "bifurcated driver"?
> > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
> > > > >
> > > > > The "bifurcated driver" stands for the kernel NIC driver that sup=
ports:
> > > > >
> > > > > 1. on-demand rx/tx queue pairs split-off and assignment to user s=
pace
> > > > >
> > > > > 2. direct NIC resource(e.g. rx/tx queue registers) access from us=
er space
> > > > >
> > > > > 3. distributing packets to kernel or user space rx queues by
> > > > >    NIC's flow director according to the filter rules
> > > > >
> > > > > Here's the kernel patch set to support.
> > > > > http://comments.gmane.org/gmane.linux.network/333615
> > > > >
> > > > >
> > > > > Usage scenario
> > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > > >
> > > > > It's well accepted by industry to use DPDK to process fast path p=
ackets in
> > > > > user space in a high performance fashion, meanwhile processing sl=
ow
> > > path
> > > > > control packets in kernel space is still needed as those packets =
usually
> > > > > rely on in_kernel TCP/IP stacks and/or socket programming interfa=
ce.
> > > > >
> > > > > KNI(Kernel NIC Interface) mechanism in DPDK is designed to meet t=
his
> > > > > requirement, with below limitation:
> > > > >
> > > > >   1) Software classifies packets and distributes them to kernel v=
ia DPDK
> > > > >      software rings, at the cost of significant CPU cycles and me=
mory
> > > bandwidth.
> > > > >
> > > > >   2) Memory copy packets between kernel' socket buffer and mbuf b=
rings
> > > > >      significant negative performance impact to KNI performance.
> > > > >
> > > > > The bifurcated driver provides a alternative approach that not on=
ly
> > > offloads
> > > > > flow classification and distribution to NIC but also support pack=
ets
> > > zero_copy.
> > > > >
> > > > > User can use standard ethtool to add filter rules to the NIC in o=
rder to
> > > > > distribute specific flows to the queues only accessed by kernel d=
river and
> > > > > stack, and add other rules to distribute packets to the queues as=
signed to
> > > > > user-space.
> > > > >
> > > > > For those rx/tx queue pairs that directly accessed from user spac=
e,
> > > > > DPDK takes over the packets rx/tx as well as corresponding DMA
> > > operation
> > > > > for high performance packet I/O.
> > > > >
> > > > >
> > > > > What's the impact and change to DPDK
> > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > > >
> > > > > DPDK usually binds PCIe NIC devices by leveraging kernel' user sp=
ace
> > > driver
> > > > > mechanism UIO or VFIO to map entire NIC' PCIe I/O space of NIC to=
 user
> > > space.
> > > > > The bifurcated driver PMD talks to a NIC interface using raw sock=
et APIs
> > > and
> > > > > only mmap() limited I/O space (e.g. certain 4K pages) for accessi=
ng
> > > involved
> > > > > rx/tx queue pairs. So the impact and changes mainly comes with be=
low:
> > > > >
> > > > > - netdev
> > > > >     DPDK needs to create a af_packet socket and bind it to a bifu=
rcated
> > > netdev.
> > > > >     The socket fd will be used to request 'queue pairs info',
> > > > >     'split/return queue pairs' and etc. The PCIe device ID, netde=
v MAC
> > > address,
> > > > >     numa info are also from the netdev response.
> > > > >
> > > > > - PCIe device scan and driver probe
> > > > >     netdev provides the PCIe device ID information. Refer to the =
device ID,
> > > > >     the correct driver should be used. And for such netdev device=
, the
> > > creation
> > > > >     of PCIe device is no longer from scan but the on-demand assig=
nment.
> > > > >
> > > > > - PCIe BAR mapping
> > > > >     "bifurcated driver" maps several pages for the queue pairs.
> > > > >     Others BAR register space maps to a fake page. The BAR mappin=
g go
> > > through
> > > > >     mmap on sockfd. Which is a little different from what UIO/VFI=
O does.
> > > > >
> > > > > - PMD
> > > > >     The PMD will no longer really initialize and configure NIC.
> > > > >     Instead, it only takes care the queue pair setup, rx_burst an=
d tx_burst.
> > > > >
> > > > > The patch uses eal '--vdev' parameter to assign netdev iface name=
 and
> > > number of
> > > > > queue pairs. Here's a example about how to configure the bifurcat=
ed
> > > driver and
> > > > > run DPDK testpmd with bifurcated PMD.
> > > > >
> > > > >   1. Set promisc mode
> > > > >   > ifconfig eth0 promisc
> > > > >
> > > > >   2. Turn on fdir
> > > > >   > ethtool -K eth0 ntuple on
> > > > >
> > > > >   3. Setup a flow director rule to distribute packets with source=
 ip
> > > > >      0.0.0.0 to rxq No.0
> > > > >   > ethtool -N eth0  flow-type udp4 src-ip 0.0.0.0 action 0
> > > > >
> > > > >   4. Run testpmd on netdev 'eth0' with 1 queue pair.
> > > > >   > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 \
> > > > >   >  --vdev=3Drte_bifurc,iface=3Deth0,qpairs=3D1 -- \
> > > > >   >  -i --rxfreet=3D32 --txfreet=3D32 --txrst=3D32
> > > > >   Note:
> > > > >     iface and qpairs arguments above specify the netdev interface=
 name
> > > and
> > > > >     number of qpairs that user space request from the "bifurcated=
 driver"
> > > > >     respectively.
> > > > >
> > > > >   5. Setup a flow director rule to distribute packets with source=
 ip
> > > > >      1.1.1.1 to rxq No.32. This needs to be done after testpmd st=
arts.
> > > > >   > ethtool -N eth0 flow-type udp4 src-ip 1.1.1.1 action 32
> > > > >
> > > > > Below illustrates the detailed changes in this patch set.
> > > > >
> > > > > eal
> > > > > --------
> > > > > The first two patches are all about the eal API declaration and L=
inux
> > > version
> > > > > definition to support af_packet socket and verbs of bifurcated ne=
tdev.
> > > > > Those APIs include the verbs like open, bind, (un)map, split/rett=
urn,
> > > map_umem.
> > > > > And other APIs like set_pci, get_ifinfo and get/put_devargs which=
 help to
> > > > > generate pci device from bifurcated netdev and get basic netdev i=
nfo.
> > > > >
> > > > > The third patch is used to allow probing driver on the PCIe VDEV =
created
> > > from
> > > > > a NIC interface driven by "bifurcated driver". It defines a new f=
lag
> > > > > 'RTE_PCI_DRV_BIFURC' used for direct ring access PMD.
> > > > >
> > > > > librte_bifurc
> > > > > ---------------
> > > > > The library is used as a VDEV bus driver to scan '--vdev=3Drte_bi=
furc' VDEV
> > > > > from eal command-line. It generates the PCIe VDEV device ready fo=
r
> > > further
> > > > > driver probe. It maintains the bifurcated device information incl=
ude
> > > sockfd,
> > > > > hwaddr, mtu, qpairs, iface_name. It's used for other direct ring =
access
> > > PMD
> > > > > to apply for bifurcated device info.
> > > > >
> > > > > direct ring access PMD
> > > > > -------------------------
> > > > > The patch provides direct ring access PMD for ixgbe. Comparing to=
 the
> > > normal
> > > > > PMD ixgbe, it uses 'RTE_PCI_DRV_BIFURC' flag during self registra=
tion.
> > > > > It mostly reuses the existing PMD ops to avoid re-implementing
> > > everything
> > > > > from scratch. And it also modifies the rx/tx_queue_setup to allow=
 queue
> > > > > setup from any queue offset.
> > > > >
> > > > > Supported NIC driver
> > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D
> > > > >
> > > > > The "bifurcated driver" kernel patch only supports "ixgbe" driver=
 at the
> > > moment,
> > > > > so this RFC patch also provides "ixgbe" PMD via direct-mapped rin=
gs as
> > > sample.
> > > > > The support for 40GE(i40e) will be added in the future.
> > > > >
> > > > > In addition, for those multi-queues enabled NIC with flow directo=
r
> > > capability
> > > > > to do perform packet classification and distribution, there's no =
special
> > > > > technical gap to provide bifurcated driver approach support.
> > > > >
> > > > > Limitation
> > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> > > > >
> > > > > By using "bifurcated driver", user space only takes over the DMA
> > > operation.
> > > > > For those NIC configure setting, it's out of control from user sp=
ace PMD.
> > > > > All the NIC setting including add/del filter rules need to be don=
e by
> > > > > standard Linux network tools(e.g. ethtool).
> > > > > So the feature support really depend on how much are supported by
> > > ethtool.
> > > > >
> > > > >
> > > > > Any questions, comments and feedback are welcome.
> > > > >
> > > > >
> > > > > -END-
> > > > >
> > > > > Signed-off-by: Cunming Liang <cunming.liang@intel.com>
> > > > > Signed-off-by: Danny Zhou <danny.zhou@intel.com>
> > > > >
> > > > > *** BLURB HERE ***
> > > > >
> > > > > Cunming Liang (6):
> > > > >   eal: common direct ring access API
> > > > >   eal: direct ring access support by linux af_packet
> > > > >   pci: allow VDEV as pci device during device driver probe
> > > > >   bifurc: add driver to scan bifurcated netdev
> > > > >   ixgbe: rx/tx queue stop bug fix
> > > > >   ixgbe: PMD for bifurc ixgbe net device
> > > > >
> > > > >  config/common_linuxapp                         |   5 +
> > > > >  lib/Makefile                                   |   1 +
> > > > >  lib/librte_bifurc/Makefile                     |  58 +++++
> > > > >  lib/librte_bifurc/rte_bifurc.c                 | 284 +++++++++++=
++++++++++
> > > > >  lib/librte_bifurc/rte_bifurc.h                 |  90 +++++++
> > > > >  lib/librte_eal/common/Makefile                 |   5 +
> > > > >  lib/librte_eal/common/include/rte_pci.h        |   4 +
> > > > >  lib/librte_eal/common/include/rte_pci_bifurc.h | 186 +++++++++++=
+++
> > > > >  lib/librte_eal/linuxapp/eal/Makefile           |   1 +
> > > > >  lib/librte_eal/linuxapp/eal/eal_pci.c          |  42 ++--
> > > > >  lib/librte_eal/linuxapp/eal/eal_pci_bifurc.c   | 336
> > > +++++++++++++++++++++++++
> > > > >  lib/librte_ether/rte_ethdev.c                  |   3 +-
> > > > >  lib/librte_pmd_ixgbe/Makefile                  |  13 +-
> > > > >  lib/librte_pmd_ixgbe/ixgbe_bifurcate.c         | 303
> > > ++++++++++++++++++++++
> > > > >  lib/librte_pmd_ixgbe/ixgbe_bifurcate.h         |  57 +++++
> > > > >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c              |  44 +++-
> > > > >  lib/librte_pmd_ixgbe/ixgbe_rxtx.h              |  10 +
> > > > >  mk/rte.app.mk                                  |   6 +
> > > > >  18 files changed, 1421 insertions(+), 27 deletions(-)
> > > > >  create mode 100644 lib/librte_bifurc/Makefile
> > > > >  create mode 100644 lib/librte_bifurc/rte_bifurc.c
> > > > >  create mode 100644 lib/librte_bifurc/rte_bifurc.h
> > > > >  create mode 100644 lib/librte_eal/common/include/rte_pci_bifurc.=
h
> > > > >  create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_bifurc.c
> > > > >  create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.c
> > > > >  create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.h
> > > > >
> > > > > --
> > > > > 1.8.1.4
> > > > >
> > > > >
> > > > AIUI, the bifurcated driver hasn't yet been accepted upstream, has =
it?
> > > Given
> > > > that, I don't think its wise to pull this in yet ahead of the kerne=
l work, as
> > > > there may still be kernel side changes that the user space pmd will=
 have to
> > > > adapt to.
> > > > Neil
> > > >
> > > Hence the RFC nature of the patch, I believe. :-) Before the kernel p=
art hits
> > > the
> > > main kernel tree we can at least discuss the overall direction to be =
taken for
> > > this driver because it's significantly different that any other HW dr=
iver.
> > >
> > > /Bruce