From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 3223B2E8B for ; Wed, 26 Nov 2014 13:11:56 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 26 Nov 2014 04:22:52 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.07,462,1413270000"; d="scan'208";a="643828269" Received: from kmsmsx151.gar.corp.intel.com ([172.21.73.86]) by orsmga002.jf.intel.com with ESMTP; 26 Nov 2014 04:22:51 -0800 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by KMSMSX151.gar.corp.intel.com (172.21.73.86) with Microsoft SMTP Server (TLS) id 14.3.195.1; Wed, 26 Nov 2014 20:22:49 +0800 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.182]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.86]) with mapi id 14.03.0195.001; Wed, 26 Nov 2014 20:22:48 +0800 From: "Zhou, Danny" To: "Walukiewicz, Miroslaw" , "Richardson, Bruce" Thread-Topic: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver Thread-Index: AQHQCLnU5FCMVbS0qkK44I+J1nYJ1Zxw3yQAgAAB3oCAAAeegIAAAZwAgACI9VCAAMFogIAAnUMA Date: Wed, 26 Nov 2014 12:22:47 +0000 Message-ID: References: <1416924682-24170-1-git-send-email-cunming.liang@intel.com> <20141125142316.GD23352@hmsreliant.think-freely.org> <20141125142956.GA6672@bricha3-MOBL3> <7C4248CAE043B144B1CD242D275626532FE0E6E5@IRSMSX104.ger.corp.intel.com> <20141125150257.GB6800@bricha3-MOBL3> <7C4248CAE043B144B1CD242D275626532FE0EC8B@IRSMSX104.ger.corp.intel.com> In-Reply-To: <7C4248CAE043B144B1CD242D275626532FE0EC8B@IRSMSX104.ger.corp.intel.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 26 Nov 2014 12:11:59 -0000 > -----Original Message----- > From: Walukiewicz, Miroslaw > Sent: Wednesday, November 26, 2014 6:45 PM > To: Zhou, Danny; Richardson, Bruce > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driver >=20 > Thank you for explanation. >=20 > I have a few questions regarding the setup flow yet: >=20 > 1. Why we need this step: > > 3. Setup a flow director rule to distribute packets with source ip > > > > > > > 0.0.0.0 to rxq No.0 > > > > > > > > ethtool -N eth0 flow-type udp4 src-ip 0.0.0.0 action 0 >=20 DZ: By default, ixgbe kernel driver uses 32 (0-31) rx/tx queue pairs. Above= example setup a filter to route a UDP flow with src_ip 0.0.0.0 to queue No.0 which is used by kern= el driver' rx/tx routine. >=20 > 2. You presented the filter setup for receiving all udp4 packets on speci= fic queue > > > > > > > 5. Setup a flow director rule to distribute packets with so= urce ip > > > > > > > 1.1.1.1 to rxq No.32. This needs to be done after testpm= d starts. > > > > > > > > ethtool -N eth0 flow-type udp4 src-ip 1.1.1.1 action 32 >=20 > How to configure flow director to receive all packets with dst-ip =3D 1.1= .1.1 on qpair=3D32? DZ: You can certainly do it using ethtool command-line like "ethtool -N eth= 0 flow-type udp4 dst-ip 1.1.1.1 action 32" to do it. > Will TCP SYN packets caught by such filter setup? DZ: Unfortunately, unlike DPDK that provides ixgbe_add_syn_filter() API to = allows program SYN Packet Queue Filter register, the=20 in_kernel ixgbe kernel driver does not touch that register. While I had see= n ixgbe 3.18.7 driver hard-code a value in that register. For all cases, there is no easy way to use ixgbe bifurcated driver to confi= g it. Under bifurcated mode, DPDK cannot access that register. > 3. Do we have a possibility to setup a rule like: > Forward all TCPv4 rx packets with dst-ip =3D1.1.1.1 and TCP port 2222 to = qpair=3D32 including SYN packets? DZ: Yes, ethtool and flow director supports that. Will send you a separated= email regarding ethtool usage regarding flow director configuration. > 3. In your application example you present that qpair number (32) is know= n before start of application > > > > > > > > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 \ > > > > > > > > --vdev=3Drte_bifurc,iface=3Deth0,qpairs=3D1 -- \ > > > > > > > > -i --rxfreet=3D32 --txfreet=3D32 --txrst=3D32 >=20 > Is there a possibility to dynamic queue allocation? I ask about API. > I mean dynamic attaching and detaching queue from application level and = not specifying the numbers in the command line. >=20 DZ: The example is just for experiment. When DPDK request queue pairs from = ixgbe bifurcated driver, it only specify number of qpairs, the kernel driver actually returns the absolute qpair index of assigned qpairs to appl= ication. Application can hence use it to invoke ethtool command-line to do = it or directly invoke IOCTL to bifurcated driver to setup FD. > 4. Is there a possibility to create a rule with perfect match and directi= ng the packets to the specific queue. > I mean here a rule like: > Forward all TCPv4 rx packets with dst-ip=3D1.1.1.1 src-ip=3D2.2.2.2 dst-p= ort=3D2222 src-port=3D1234 to queue 33 >=20 DZ: Yes, of course you can. > Regards, >=20 > Mirek >=20 > > -----Original Message----- > > From: Zhou, Danny > > Sent: Tuesday, November 25, 2014 4:23 PM > > To: Richardson, Bruce; Walukiewicz, Miroslaw > > Cc: dev@dpdk.org > > Subject: RE: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated driv= er > > > > > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson > > > Sent: Tuesday, November 25, 2014 11:03 PM > > > To: Walukiewicz, Miroslaw > > > Cc: dev@dpdk.org > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcated dr= iver > > > > > > On Tue, Nov 25, 2014 at 02:57:13PM +0000, Walukiewicz, Miroslaw wrote= : > > > > Thank you Bruce for explanation of the idea. > > > > > > Actually, credit goes to Steve Liang, not me, for the explanation. :-= ) > > > > > > > > > > > I have question regarding TCP SYN packets? Do you have any idea how= to > > share the TCP SYN requests between kernel and > > > user-space application? > > > > > > As I'm giving the credit to Steve, I'll also pass the buck for answer= ing that > > > question to him too! :-) > > > > > > /Bruce > > > > On ixgbe' Rx queuing flow, match SYN filter stage is prior to Flow Dire= ctor > > filter stage. When working at bifurcated driver support mode, > > DPDK cannot access those NIC registers except for the ones that are use= d to > > rx/tx packets for assigned rx/tx queue pairs. So basically it really > > depends on user to use ethtool or other interface to setup SYN filter v= ia > > ixgbe bifurcated driver. User can distribute TCP SYN packets to > > kernel bifurcated driver owned rx queues or DPDK owned rx queues, for t= he > > latter case, DPDK can still push them back to kernel via KNI if DPDK > > does not want to use them. If you have a user space TCP/IP stacks on to= p of > > DPDK, you can push them to the upper level stack rather instead. > > > > > > > > > > Regards, > > > > > > > > Mirek > > > > > > > > > -----Original Message----- > > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce > > Richardson > > > > > Sent: Tuesday, November 25, 2014 3:30 PM > > > > > To: Neil Horman > > > > > Cc: dev@dpdk.org > > > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/6] DPDK support to bifurcate= d > > driver > > > > > > > > > > On Tue, Nov 25, 2014 at 09:23:16AM -0500, Neil Horman wrote: > > > > > > On Tue, Nov 25, 2014 at 10:11:16PM +0800, Cunming Liang wrote: > > > > > > > > > > > > > > This is a RFC patch set to support "bifurcated driver" in DPD= K. > > > > > > > > > > > > > > > > > > > > > What is "bifurcated driver"? > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D > > > > > > > > > > > > > > The "bifurcated driver" stands for the kernel NIC driver that > > supports: > > > > > > > > > > > > > > 1. on-demand rx/tx queue pairs split-off and assignment to us= er > > space > > > > > > > > > > > > > > 2. direct NIC resource(e.g. rx/tx queue registers) access fro= m user > > space > > > > > > > > > > > > > > 3. distributing packets to kernel or user space rx queues by > > > > > > > NIC's flow director according to the filter rules > > > > > > > > > > > > > > Here's the kernel patch set to support. > > > > > > > http://comments.gmane.org/gmane.linux.network/333615 > > > > > > > > > > > > > > > > > > > > > Usage scenario > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > > > > > > It's well accepted by industry to use DPDK to process fast pa= th > > packets in > > > > > > > user space in a high performance fashion, meanwhile processin= g > > slow > > > > > path > > > > > > > control packets in kernel space is still needed as those pack= ets > > usually > > > > > > > rely on in_kernel TCP/IP stacks and/or socket programming > > interface. > > > > > > > > > > > > > > KNI(Kernel NIC Interface) mechanism in DPDK is designed to me= et > > this > > > > > > > requirement, with below limitation: > > > > > > > > > > > > > > 1) Software classifies packets and distributes them to kern= el via > > DPDK > > > > > > > software rings, at the cost of significant CPU cycles an= d memory > > > > > bandwidth. > > > > > > > > > > > > > > 2) Memory copy packets between kernel' socket buffer and mb= uf > > brings > > > > > > > significant negative performance impact to KNI performan= ce. > > > > > > > > > > > > > > The bifurcated driver provides a alternative approach that no= t only > > > > > offloads > > > > > > > flow classification and distribution to NIC but also support = packets > > > > > zero_copy. > > > > > > > > > > > > > > User can use standard ethtool to add filter rules to the NIC = in order > > to > > > > > > > distribute specific flows to the queues only accessed by kern= el > > driver and > > > > > > > stack, and add other rules to distribute packets to the queue= s > > assigned to > > > > > > > user-space. > > > > > > > > > > > > > > For those rx/tx queue pairs that directly accessed from user = space, > > > > > > > DPDK takes over the packets rx/tx as well as corresponding DM= A > > > > > operation > > > > > > > for high performance packet I/O. > > > > > > > > > > > > > > > > > > > > > What's the impact and change to DPDK > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > > > > > > DPDK usually binds PCIe NIC devices by leveraging kernel' use= r > > space > > > > > driver > > > > > > > mechanism UIO or VFIO to map entire NIC' PCIe I/O space of NI= C to > > user > > > > > space. > > > > > > > The bifurcated driver PMD talks to a NIC interface using raw = socket > > APIs > > > > > and > > > > > > > only mmap() limited I/O space (e.g. certain 4K pages) for acc= essing > > > > > involved > > > > > > > rx/tx queue pairs. So the impact and changes mainly comes wit= h > > below: > > > > > > > > > > > > > > - netdev > > > > > > > DPDK needs to create a af_packet socket and bind it to a > > bifurcated > > > > > netdev. > > > > > > > The socket fd will be used to request 'queue pairs info', > > > > > > > 'split/return queue pairs' and etc. The PCIe device ID, n= etdev MAC > > > > > address, > > > > > > > numa info are also from the netdev response. > > > > > > > > > > > > > > - PCIe device scan and driver probe > > > > > > > netdev provides the PCIe device ID information. Refer to = the > > device ID, > > > > > > > the correct driver should be used. And for such netdev de= vice, > > the > > > > > creation > > > > > > > of PCIe device is no longer from scan but the on-demand > > assignment. > > > > > > > > > > > > > > - PCIe BAR mapping > > > > > > > "bifurcated driver" maps several pages for the queue pair= s. > > > > > > > Others BAR register space maps to a fake page. The BAR ma= pping > > go > > > > > through > > > > > > > mmap on sockfd. Which is a little different from what UIO= /VFIO > > does. > > > > > > > > > > > > > > - PMD > > > > > > > The PMD will no longer really initialize and configure NI= C. > > > > > > > Instead, it only takes care the queue pair setup, rx_burs= t and > > tx_burst. > > > > > > > > > > > > > > The patch uses eal '--vdev' parameter to assign netdev iface = name > > and > > > > > number of > > > > > > > queue pairs. Here's a example about how to configure the > > bifurcated > > > > > driver and > > > > > > > run DPDK testpmd with bifurcated PMD. > > > > > > > > > > > > > > 1. Set promisc mode > > > > > > > > ifconfig eth0 promisc > > > > > > > > > > > > > > 2. Turn on fdir > > > > > > > > ethtool -K eth0 ntuple on > > > > > > > > > > > > > > 3. Setup a flow director rule to distribute packets with so= urce ip > > > > > > > 0.0.0.0 to rxq No.0 > > > > > > > > ethtool -N eth0 flow-type udp4 src-ip 0.0.0.0 action 0 > > > > > > > > > > > > > > 4. Run testpmd on netdev 'eth0' with 1 queue pair. > > > > > > > > ./x86_64-native-linuxapp-gcc/app/testpmd -c 0x3 -n 4 \ > > > > > > > > --vdev=3Drte_bifurc,iface=3Deth0,qpairs=3D1 -- \ > > > > > > > > -i --rxfreet=3D32 --txfreet=3D32 --txrst=3D32 > > > > > > > Note: > > > > > > > iface and qpairs arguments above specify the netdev inter= face > > name > > > > > and > > > > > > > number of qpairs that user space request from the "bifurc= ated > > driver" > > > > > > > respectively. > > > > > > > > > > > > > > 5. Setup a flow director rule to distribute packets with so= urce ip > > > > > > > 1.1.1.1 to rxq No.32. This needs to be done after testpm= d starts. > > > > > > > > ethtool -N eth0 flow-type udp4 src-ip 1.1.1.1 action 32 > > > > > > > > > > > > > > Below illustrates the detailed changes in this patch set. > > > > > > > > > > > > > > eal > > > > > > > -------- > > > > > > > The first two patches are all about the eal API declaration a= nd Linux > > > > > version > > > > > > > definition to support af_packet socket and verbs of bifurcate= d > > netdev. > > > > > > > Those APIs include the verbs like open, bind, (un)map, split/= retturn, > > > > > map_umem. > > > > > > > And other APIs like set_pci, get_ifinfo and get/put_devargs w= hich > > help to > > > > > > > generate pci device from bifurcated netdev and get basic netd= ev > > info. > > > > > > > > > > > > > > The third patch is used to allow probing driver on the PCIe V= DEV > > created > > > > > from > > > > > > > a NIC interface driven by "bifurcated driver". It defines a n= ew flag > > > > > > > 'RTE_PCI_DRV_BIFURC' used for direct ring access PMD. > > > > > > > > > > > > > > librte_bifurc > > > > > > > --------------- > > > > > > > The library is used as a VDEV bus driver to scan '--vdev=3Drt= e_bifurc' > > VDEV > > > > > > > from eal command-line. It generates the PCIe VDEV device read= y > > for > > > > > further > > > > > > > driver probe. It maintains the bifurcated device information = include > > > > > sockfd, > > > > > > > hwaddr, mtu, qpairs, iface_name. It's used for other direct r= ing > > access > > > > > PMD > > > > > > > to apply for bifurcated device info. > > > > > > > > > > > > > > direct ring access PMD > > > > > > > ------------------------- > > > > > > > The patch provides direct ring access PMD for ixgbe. Comparin= g to > > the > > > > > normal > > > > > > > PMD ixgbe, it uses 'RTE_PCI_DRV_BIFURC' flag during self > > registration. > > > > > > > It mostly reuses the existing PMD ops to avoid re-implementin= g > > > > > everything > > > > > > > from scratch. And it also modifies the rx/tx_queue_setup to a= llow > > queue > > > > > > > setup from any queue offset. > > > > > > > > > > > > > > Supported NIC driver > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > > > > > > > > > > > > > > The "bifurcated driver" kernel patch only supports "ixgbe" dr= iver at > > the > > > > > moment, > > > > > > > so this RFC patch also provides "ixgbe" PMD via direct-mapped= rings > > as > > > > > sample. > > > > > > > The support for 40GE(i40e) will be added in the future. > > > > > > > > > > > > > > In addition, for those multi-queues enabled NIC with flow dir= ector > > > > > capability > > > > > > > to do perform packet classification and distribution, there's= no > > special > > > > > > > technical gap to provide bifurcated driver approach support. > > > > > > > > > > > > > > Limitation > > > > > > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > > > > > > > > > > > > > By using "bifurcated driver", user space only takes over the = DMA > > > > > operation. > > > > > > > For those NIC configure setting, it's out of control from use= r space > > PMD. > > > > > > > All the NIC setting including add/del filter rules need to be= done by > > > > > > > standard Linux network tools(e.g. ethtool). > > > > > > > So the feature support really depend on how much are supporte= d > > by > > > > > ethtool. > > > > > > > > > > > > > > > > > > > > > Any questions, comments and feedback are welcome. > > > > > > > > > > > > > > > > > > > > > -END- > > > > > > > > > > > > > > Signed-off-by: Cunming Liang > > > > > > > Signed-off-by: Danny Zhou > > > > > > > > > > > > > > *** BLURB HERE *** > > > > > > > > > > > > > > Cunming Liang (6): > > > > > > > eal: common direct ring access API > > > > > > > eal: direct ring access support by linux af_packet > > > > > > > pci: allow VDEV as pci device during device driver probe > > > > > > > bifurc: add driver to scan bifurcated netdev > > > > > > > ixgbe: rx/tx queue stop bug fix > > > > > > > ixgbe: PMD for bifurc ixgbe net device > > > > > > > > > > > > > > config/common_linuxapp | 5 + > > > > > > > lib/Makefile | 1 + > > > > > > > lib/librte_bifurc/Makefile | 58 +++++ > > > > > > > lib/librte_bifurc/rte_bifurc.c | 284 > > +++++++++++++++++++++ > > > > > > > lib/librte_bifurc/rte_bifurc.h | 90 +++++++ > > > > > > > lib/librte_eal/common/Makefile | 5 + > > > > > > > lib/librte_eal/common/include/rte_pci.h | 4 + > > > > > > > lib/librte_eal/common/include/rte_pci_bifurc.h | 186 > > ++++++++++++++ > > > > > > > lib/librte_eal/linuxapp/eal/Makefile | 1 + > > > > > > > lib/librte_eal/linuxapp/eal/eal_pci.c | 42 ++-- > > > > > > > lib/librte_eal/linuxapp/eal/eal_pci_bifurc.c | 336 > > > > > +++++++++++++++++++++++++ > > > > > > > lib/librte_ether/rte_ethdev.c | 3 +- > > > > > > > lib/librte_pmd_ixgbe/Makefile | 13 +- > > > > > > > lib/librte_pmd_ixgbe/ixgbe_bifurcate.c | 303 > > > > > ++++++++++++++++++++++ > > > > > > > lib/librte_pmd_ixgbe/ixgbe_bifurcate.h | 57 +++++ > > > > > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 44 +++- > > > > > > > lib/librte_pmd_ixgbe/ixgbe_rxtx.h | 10 + > > > > > > > mk/rte.app.mk | 6 + > > > > > > > 18 files changed, 1421 insertions(+), 27 deletions(-) > > > > > > > create mode 100644 lib/librte_bifurc/Makefile > > > > > > > create mode 100644 lib/librte_bifurc/rte_bifurc.c > > > > > > > create mode 100644 lib/librte_bifurc/rte_bifurc.h > > > > > > > create mode 100644 > > lib/librte_eal/common/include/rte_pci_bifurc.h > > > > > > > create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_bifur= c.c > > > > > > > create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.c > > > > > > > create mode 100644 lib/librte_pmd_ixgbe/ixgbe_bifurcate.h > > > > > > > > > > > > > > -- > > > > > > > 1.8.1.4 > > > > > > > > > > > > > > > > > > > > AIUI, the bifurcated driver hasn't yet been accepted upstream, = has it? > > > > > Given > > > > > > that, I don't think its wise to pull this in yet ahead of the k= ernel work, > > as > > > > > > there may still be kernel side changes that the user space pmd = will > > have to > > > > > > adapt to. > > > > > > Neil > > > > > > > > > > > Hence the RFC nature of the patch, I believe. :-) Before the kern= el part > > hits > > > > > the > > > > > main kernel tree we can at least discuss the overall direction to= be > > taken for > > > > > this driver because it's significantly different that any other H= W driver. > > > > > > > > > > /Bruce