From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out1134-211.mail.aliyun.com (out1134-211.mail.aliyun.com [42.120.134.211]) by dpdk.org (Postfix) with ESMTP id A1C6E5A83 for ; Fri, 21 Aug 2015 21:18:15 +0200 (CEST) X-Alimail-AntiSpam: AC=CONTINUE; BC=0.2845179|-1; FP=0|0|0|0|0|-1|-1|-1; HT=e02c03302; MF=jun.xiao@cloudnetengine.com; NM=1; PH=DW; RN=2; RT=2; SR=0; Received: from WS-web (jun.xiao@cloudnetengine.com[113.45.216.138]) by e02c03270.eu6 at Sat, 22 Aug 2015 03:18:11 +0800 Date: Sat, 22 Aug 2015 03:18:11 +0800 From: "Jun Xiao" To: "=?UTF-8?B?R3JheSwgTWFyayBE?=" Message-ID: <----Tc------lRRzc$02c92e56-5191-4d60-b371-1937fd75dd2b@cloudnetengine.com> X-Mailer: Alimail-Mailagent revision 2704164 MIME-Version: 1.0 x-aliyun-mail-creator: W4_2704500_82MTW96aWxsYS81LjAgKE1hY2ludG9zaDsgSW50ZWwgTWFjIE9TIFggMTBfOV81KSBBcHBsZVdlYktpdC82MDAuMi41IChLSFRNTCwgbGlrZSBHZWNrbykgVmVyc2lvbi83LjEuMiBTYWZhcmkvNTM3Ljg1LjExpd Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: dev Subject: [dpdk-dev] =?utf-8?q?vSwitch_Performance_Comparison_for_NFV_Use_C?= =?utf-8?q?ase?= X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Jun Xiao List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Aug 2015 19:18:16 -0000 Hi Mark,=0ALast time we discussed methodologies for vSwitch performance compar= ison, and the performance data we published is more for typical TCP based appl= ications in virtualized data centers.Today we shared more data for small packe= t size traffic at=C2=A0http://cloudnetengine.com/en/blog/2015/08/21/vswitch-pe= rformance-comparison-nfv-use-case, and the perfomance gets much closed (around= =C2=A010-20%) between OVS-DPDK and CNE vSwitch as the tests are barely forward= ing and without any other features.=0A=0AOn the other hand, it's really hard t= o find any public performance data for OVS-DPDK under pNIC -> vSwitch -> VM ->= vSwitch -> pNIC case. What I observed is that OVS-DPDK can have generally les= s than 3 MPPS on my setup (vhost user is used instead of IVSHMEM), don't know = if the data are aligned with what you have?=0AThanks,Junwww.cloudnetengine.com >From stephen@networkplumber.org Fri Aug 21 22:25:37 2015 Return-Path: Received: from mail-ig0-f173.google.com (mail-ig0-f173.google.com [209.85.213.173]) by dpdk.org (Postfix) with ESMTP id 1756A8DAA for ; Fri, 21 Aug 2015 22:25:37 +0200 (CEST) Received: by igfj19 with SMTP id j19so26035076igf.0 for ; Fri, 21 Aug 2015 13:25:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=qa7U39pG6P/6ZM/fgE1uQZj7nwS0ffVs9Jw07m1S2Yg=; b=GZBnu79WtRLfS4CCElyC6oX/iFryEE/1iGbXXX/q/mmXrfaO2NeiKLREI8K+bq4f55 urgAwKNmx7aoPqeaR2xv1VGkokWyr+6K0yP5UVZBiw6swKJh+r+bjCaca1ZXXOjzuUDC T5fVfnq0HwpCmV0I4IjecikFx4AVDzkutl+2blo55JbQjZOWeNIy6tRdJAobmQdEHM7a DkdfrLqtbUZ+CbaZl80NQt2IE6tbODe//KYGfevyyTK13rfUPFOl7Y4ZnLXZqFGJm2XJ YgvenEQT7dOnhtc7hLXdOYNADSTFShjcvTxCyfrl2zUcu+IRJT0jAKQ+b01ooqEuiUiL XytQ== X-Gm-Message-State: ALoCoQnp/6xrYINs4m683s+MnXS2cHRBkFgIiNItWTaMlVgGKRUUjLpdWN9To1gOsba5w4Ce9i16 MIME-Version: 1.0 X-Received: by 10.50.136.134 with SMTP id qa6mr4547843igb.13.1440188736445; Fri, 21 Aug 2015 13:25:36 -0700 (PDT) Received: by 10.64.197.39 with HTTP; Fri, 21 Aug 2015 13:25:36 -0700 (PDT) In-Reply-To: <55D76854.5010306@linaro.org> References: <55D76854.5010306@linaro.org> Date: Fri, 21 Aug 2015 13:25:36 -0700 Message-ID: From: Stephen Hemminger To: Zoltan Kiss Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Cc: "dev@dpdk.org" , dev@openvswitch.org Subject: Re: [dpdk-dev] OVS-DPDK performance problem on ixgbe vector PMD X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 21 Aug 2015 20:25:37 -0000 Use perf top it gives much better data than oprofile On Fri, Aug 21, 2015 at 11:05 AM, Zoltan Kiss wrote: > Hi, > > I've set up a simple packet forwarding perf test on a dual-port 10G > 82599ES: one port receives 64 byte UDP packets, the other sends it out, one > core used. I've used latest OVS with DPDK 2.1, and the first result was > only 13.2 Mpps, which was a bit far from the 13.9 I've seen last year with > the same test. The first thing I've changed was to revert back to the old > behaviour about this issue: > > http://permalink.gmane.org/gmane.comp.networking.dpdk.devel/22731 > > So instead of the new default I've passed 2048 + RTE_PKTMBUF_HEADROOM. > That increased the performance to 13.5, but to figure out what's wrong > started to play with the receive functions. First I've disabled vector PMD, > but ixgbe_recv_pkts_bulk_alloc() was even worse, only 12.5 Mpps. So then > I've enabled scattered RX, and with ixgbe_recv_pkts_lro_bulk_alloc() I > could manage to get 13.98 Mpps, which is I guess as close as possible to > the 14.2 line rate (on my HW at least, with one core) > Does anyone has a good explanation about why the vector PMD performs so > significantly worse? I would expect that on a 3.2 GHz i5-4570 one core > should be able to reach ~14 Mpps, SG and vector PMD shouldn't make a > difference. > I've tried to look into it with oprofile, but the results were quite > strange: 35% of the samples were from miniflow_extract, the part where > parse_vlan calls data_pull to jump after the MAC addresses. The oprofile > snippet (1M samples): > > 511454 19 0.0037 flow.c:511 > 511458 149 0.0292 dp-packet.h:266 > 51145f 4264 0.8357 dp-packet.h:267 > 511466 18 0.0035 dp-packet.h:268 > 51146d 43 0.0084 dp-packet.h:269 > 511474 172 0.0337 flow.c:511 > 51147a 4320 0.8467 string3.h:51 > 51147e 358763 70.3176 flow.c:99 > 511482 2 3.9e-04 string3.h:51 > 511485 3060 0.5998 string3.h:51 > 511488 1693 0.3318 string3.h:51 > 51148c 2933 0.5749 flow.c:326 > 511491 47 0.0092 flow.c:326 > > And the corresponding disassembled code: > > 511454: 49 83 f9 0d cmp r9,0xd > 511458: c6 83 81 00 00 00 00 mov BYTE PTR [rbx+0x81],0x0 > 51145f: 66 89 83 82 00 00 00 mov WORD PTR [rbx+0x82],ax > 511466: 66 89 93 84 00 00 00 mov WORD PTR [rbx+0x84],dx > 51146d: 66 89 8b 86 00 00 00 mov WORD PTR [rbx+0x86],cx > 511474: 0f 86 af 01 00 00 jbe 511629 > > 51147a: 48 8b 45 00 mov rax,QWORD PTR [rbp+0x0] > 51147e: 4c 8d 5d 0c lea r11,[rbp+0xc] > 511482: 49 89 00 mov QWORD PTR [r8],rax > 511485: 8b 45 08 mov eax,DWORD PTR [rbp+0x8] > 511488: 41 89 40 08 mov DWORD PTR [r8+0x8],eax > 51148c: 44 0f b7 55 0c movzx r10d,WORD PTR [rbp+0xc] > 511491: 66 41 81 fa 81 00 cmp r10w,0x81 > > My only explanation to this so far is that I misunderstand something about > the oprofile results. > > Regards, > > Zoltan >