From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 3EBB32BD4 for ; Fri, 22 Apr 2016 07:24:32 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP; 21 Apr 2016 22:24:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,516,1455004800"; d="scan'208";a="950489881" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by fmsmga001.fm.intel.com with ESMTP; 21 Apr 2016 22:24:25 -0700 Received: from fmsmsx155.amr.corp.intel.com (10.18.116.71) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 21 Apr 2016 22:24:25 -0700 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by FMSMSX155.amr.corp.intel.com (10.18.116.71) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 21 Apr 2016 22:24:25 -0700 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.229]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.155]) with mapi id 14.03.0248.002; Fri, 22 Apr 2016 13:24:17 +0800 From: "Wang, Zhihong" To: Thomas Monjalon , "Richardson, Bruce" CC: "dev@dpdk.org" , "De Lara Guarch, Pablo" Thread-Topic: [dpdk-dev] [RFC PATCH 0/2] performance utility in testpmd Thread-Index: AQHRm5Fx+stu0PIj/Ee/h5c6FqWlfp+TqkQAgAGAFHA= Date: Fri, 22 Apr 2016 05:24:17 +0000 Message-ID: <8F6C2BD409508844A0EFC19955BE09410342D1A2@SHSMSX103.ccr.corp.intel.com> References: <1461192195-104070-1-git-send-email-zhihong.wang@intel.com> <1946900.ocWSxO32dE@xps13> In-Reply-To: <1946900.ocWSxO32dE@xps13> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMWViNDZhZjAtMmUzZi00NzM2LThlOGItMDgwMWFlNDE3ZjFkIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6IldrNk80ZHU2d3huTzBzVEczMGZhOWJTdUozZjlYZE40XC8za0tOTDBUZTZvPSJ9 x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC PATCH 0/2] performance utility in testpmd X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 22 Apr 2016 05:24:32 -0000 > -----Original Message----- > From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com] > Sent: Thursday, April 21, 2016 5:54 PM > To: Wang, Zhihong > Cc: dev@dpdk.org; De Lara Guarch, Pablo > Subject: Re: [dpdk-dev] [RFC PATCH 0/2] performance utility in testpmd >=20 > 2016-04-20 18:43, Zhihong Wang: > > This RFC patch proposes a general purpose forwarding engine in testpmd > > namely "portfwd", to enable performance analysis and tuning for poll mo= de > > drivers in vSwitching scenarios. > > > > > > Problem statement > > ----------------- > > > > vSwitching is more I/O bound in a lot of cases since there are a lot of > > LLC/cross-core memory accesses. > > > > In order to reveal memory/cache behavior in real usage scenarios and en= able > > efficient performance analysis and tuning for vSwitching, DPDK needs a > > sample application that supports traffic flow close to real deployment, > > e.g. multi-tenancy, service chaining. > > > > There is a vhost sample application currently to enable simple vSwitchi= ng > > scenarios, it comes with several limitations: > > > > 1) Traffic flow is too simple and not flexible > > > > 2) Switching based on MAC/VLAN only > > > > 3) Not enough performance metrics > > > > > > Proposed solution > > ----------------- > > > > The testpmd sample application is a good choice, it's a powerful poll m= ode > > driver management framework hosts various forwarding engine. >=20 > Not sure it is a good choice. > The goal of testpmd is to test every PMD features. > How far can we go in adding some stack processing while keeping it > easily maintainable? Thanks for the quick response! This utility is not for vSwitching in particular, it's just adding more for= warding setup capabilities in testpmd. testpmd composes of separated components: 1) pmd management framework 2) forwarding engines: a) traffic setup b) forwarding function When adding a new fwd engine, only the new traffic setup function and forwarding function (maybe cmd handlers too) are added, no existing things are touched. So it doesn't make it harder to maintain. It also doesn't change the current behavior at all, by default it's still i= ofwd, the user can switch to portfwd only when flexible forwarding rules are needed. Also, I believe in both DPDK and OVS-DPDK community, testpmd has Already become a widely used tool to setup performance and functional test, and there're some complains about the usability and flexibility. Just one of the many examples to show why we need a feature-rich fwd engine: There was an OVS bug reported by Red Hat that took both OVS and DPDK a long time to investigate, and it turned out to be a testpmd setup issue: They used testpmd in the guest to do the forwarding, and when multiqueue is enabled, current testpmd have to use separated cores for each rxq, so insufficient cores will result in untended rxqs, which is not = an expected result, and not an necessary limitation. Also, when OVS-DPDK are integrating multiqueue, a lot of cores have to be assigned to the VM to handle all the rxqs for the test, which puts limitation on both performance test and functional test because a single numa node have limited cores. Another thing is about the learning curve to use DPDK sample application, we can actually use portfwd for all kinds of pmd test (both host and guest, nic pmds, vhost pmds, virtio pmds, etc.), and it's simple to use, instead o= f useing different apps, like vhost sample in the host and testpmd in the guest. >=20 > > Now with the vhost pmd feature, it can also handle vhost devices, only = a > > new forwarding engine is needed to make use of it. >=20 > Why a new forwarding engine is needed for vhost? Appologize for my poor English, what I meant is with the vhost pmd feature, testpmd has become a vSwitch already, we just need to add more forwarding setup capability to make use of it. >=20 > > portfwd is implemented to this end. > > > > Features of portfwd: > > > > 1) Build up traffic from simple rx/tx to complex scenarios easily > > > > 2) Rich performance statistics for all ports >=20 > Have you checked CONFIG_RTE_TEST_PMD_RECORD_CORE_CYCLES and > CONFIG_RTE_TEST_PMD_RECORD_BURST_STATS? These stats are good, it'll be even better to have per rx/tx cycle & burst = size info for each port in portfwd, like: cycle stat (since last show) ---------------- port 0, burst 32, rx, run, min, avg, max, 0, 0, 0, 0, 0, 1, 21, 596, 663, 752, 2, 289, 580, 725, 1056, 3, 6, 644, 686, 796, 4, 153, 656, 724, 896, [...] 32, 1208666, 756, 1206, 19212, tx, run, min, avg, max, 0, 0, 0, 0, 0,=20 [...] 32, 1208652, 476, 559, 12144, >=20 > > 3) Core affinity manipulation > > > > 4) Commands for run time configuration > > > > Notice that portfwd has fair performance, but it's not for getting the > > "maximum" numbers: > > > > 1) It buffers packets for burst send efficiency analysis, which incr= ease > > latency > > > > 2) It touches the packet header and collect performance statistics w= hich > > adds overheads > > > > These "extra" overheads are actually what happens in real applications. > [...] > > Implementation details > > ---------------------- > > > > To enable flexible traffic flow setup, each port has 2 ways to forward > > packets in portfwd: >=20 > Should not it be 2 forward engines? > Please first describe the existing engines to help making a decision. It's actually 1 engine. A fwd engine means a forwarding function to be called in the testpmd framework. Take iofwd for example, in its fwd function: pkt_burst_io_forward: Simply call rte_eth_rx_burst for an rxq and then rte_eth_tx_burst to the fixed mapping txq. In portfwd, it's basically the same, but we get the dst port and queue dynamically before rte_eth_tx_burst, that's all. Current engines are: * csumonly.c * flowgen.c * icmpecho.c * ieee1588fwd.c * iofwd.c * macfwd.c * macfwd-retry.c * macswap.c * rxonly.c * txonly.c All of them have fixed traffic setup, for instance, if we have 3 ports, the traffic will be like this: Logical Core 14 (socket 0) forwards packets on 3 streams: 0: RX P=3D0/Q=3D0 (socket 0) -> TX P=3D2/Q=3D0 (socket 0) peer=3D02:00:0= 0:00:00:02 0: RX P=3D1/Q=3D0 (socket 0) -> TX P=3D0/Q=3D0 (socket 0) peer=3D02:00:0= 0:00:00:00 0: RX P=3D2/Q=3D0 (socket 0) -> TX P=3D0/Q=3D0 (socket 0) peer=3D02:00:0= 0:00:00:00 And you can't change it into something like: port 0 -> port 1 -> port 2 Not to mention the multiqueue limitation and core affinity manipulation. Like, when we have 2 ports each with 2 queues running on 1 core, the traffi= c will be like this: Logical Core 14 (socket 0) forwards packets on 1 streams: 0: RX P=3D0/Q=3D0 (socket 0) -> TX P=3D1/Q=3D0 (socket 0) peer=3D02:00:0= 0:00:00:01 Only 1 rxq will be handled. This is the Red Hat issue I mentioned above. >=20 > > 1) Forward based on dst ip > [...] > > 2) Forward to a fixed port > [...]