From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 3061B108A for ; Wed, 8 Mar 2017 07:01:09 +0100 (CET) Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 07 Mar 2017 22:01:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.36,262,1486454400"; d="scan'208";a="72925282" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga005.jf.intel.com with ESMTP; 07 Mar 2017 22:01:08 -0800 Received: from fmsmsx120.amr.corp.intel.com (10.18.124.208) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.248.2; Tue, 7 Mar 2017 22:01:07 -0800 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by fmsmsx120.amr.corp.intel.com (10.18.124.208) with Microsoft SMTP Server (TLS) id 14.3.248.2; Tue, 7 Mar 2017 22:01:07 -0800 Received: from shsmsx102.ccr.corp.intel.com ([169.254.2.88]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.204]) with mapi id 14.03.0248.002; Wed, 8 Mar 2017 14:01:04 +0800 From: "Yao, Lei A" To: Maxime Coquelin , Yuanhan Liu CC: "Liang, Cunming" , "Tan, Jianfeng" , "dev@dpdk.org" , "Wang, Zhihong" Thread-Topic: [RFC PATCH] net/virtio: Align Virtio-net header on cache line in receive path Thread-Index: AQHSllaFCY4FmsSshECWMitmo3ifB6GHVE4AgAMeWOA= Date: Wed, 8 Mar 2017 06:01:03 +0000 Message-ID: <2DBBFF226F7CF64BAFCA79B681719D953A15F40C@shsmsx102.ccr.corp.intel.com> References: <20170221173243.20779-1-maxime.coquelin@redhat.com> <20170222013734.GJ18844@yliu-dev.sh.intel.com> <024ad979-8b54-ac33-54b4-5f8753b74d75@redhat.com> <20170223054954.GU18844@yliu-dev.sh.intel.com> <349f9a71-7407-e45a-4687-a54fe7e778c8@redhat.com> <20170306084649.GH18844@yliu-dev.sh.intel.com> <3fa785c1-7a7b-b13e-4bd0-ee34ed5985fe@redhat.com> In-Reply-To: <3fa785c1-7a7b-b13e-4bd0-ee34ed5985fe@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC PATCH] net/virtio: Align Virtio-net header on cache line in receive path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Mar 2017 06:01:09 -0000 > -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > Sent: Monday, March 6, 2017 10:11 PM > To: Yuanhan Liu > Cc: Liang, Cunming ; Tan, Jianfeng > ; dev@dpdk.org; Wang, Zhihong > ; Yao, Lei A > Subject: Re: [RFC PATCH] net/virtio: Align Virtio-net header on cache lin= e in > receive path >=20 >=20 >=20 > On 03/06/2017 09:46 AM, Yuanhan Liu wrote: > > On Wed, Mar 01, 2017 at 08:36:24AM +0100, Maxime Coquelin wrote: > >> > >> > >> On 02/23/2017 06:49 AM, Yuanhan Liu wrote: > >>> On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote: > >>>> > >>>> > >>>> On 02/22/2017 02:37 AM, Yuanhan Liu wrote: > >>>>> On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote: > >>>>>> This patch aligns the Virtio-net header on a cache-line boundary t= o > >>>>>> optimize cache utilization, as it puts the Virtio-net header (whic= h > >>>>>> is always accessed) on the same cache line as the packet header. > >>>>>> > >>>>>> For example with an application that forwards packets at L2 level, > >>>>>> a single cache-line will be accessed with this patch, instead of > >>>>>> two before. > >>>>> > >>>>> I'm assuming you were testing pkt size <=3D (64 - hdr_size)? > >>>> > >>>> No, I tested with 64 bytes packets only. > >>> > >>> Oh, my bad, I overlooked it. While you were saying "a single cache > >>> line", I was thinking putting the virtio net hdr and the "whole" > >>> packet data in single cache line, which is not possible for pkt > >>> size 64B. > >>> > >>>> I run some more tests this morning with different packet sizes, > >>>> and also with changing the mbuf size on guest side to have multi- > >>>> buffers packets: > >>>> > >>>> +-------+--------+--------+-------------------------+ > >>>> | Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align | > >>>> +-------+--------+--------+-------------------------+ > >>>> | 64 | 2048 | 11.05 | 11.78 | > >>>> | 128 | 2048 | 10.66 | 11.48 | > >>>> | 256 | 2048 | 10.47 | 11.21 | > >>>> | 512 | 2048 | 10.22 | 10.88 | > >>>> | 1024 | 2048 | 7.65 | 7.84 | > >>>> | 1500 | 2048 | 6.25 | 6.45 | > >>>> | 2000 | 2048 | 5.31 | 5.43 | > >>>> | 2048 | 2048 | 5.32 | 4.25 | > >>>> | 1500 | 512 | 3.89 | 3.98 | > >>>> | 2048 | 512 | 1.96 | 2.02 | > >>>> +-------+--------+--------+-------------------------+ > >>> > >>> Could you share more info, say is it a PVP test? Is mergeable on? > >>> What's the fwd mode? > >> > >> No, this is not PVP benchmark, I have neither another server nor a pac= ket > >> generator connected to my Haswell machine back-to-back. > >> > >> This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in > >> rxonly. In this configuration, mergeable is ON and no offload disabled > >> in QEMU cmdline. > > > > Okay, I see. So the boost, as you have stated, comes from saving two > > cache line access to one. Before that, vhost write 2 cache lines, > > while the virtio pmd reads 2 cache lines: one for reading the header, > > another one for reading the ether header, for updating xstats (there > > is no ether access in the fwd mode you tested). > > > >> That's why I would be interested in more testing on recent hardware > >> with PVP benchmark. Is it something that could be run in Intel lab? > > > > I think Yao Lei could help on that? But as stated, I think it may > > break the performance for bit packets. And I also won't expect big > > boost even for 64B in PVP test, judging that it's only 6% boost in > > micro bechmarking. > That would be great. > Note that on SandyBridge, on which I see a drop in perf with > microbenchmark, I get a 4% gain on PVP benchmark. So on recent hardware > that show a gain on microbenchmark, I'm curious of the gain with PVP > bench. >=20 Hi, Maxime, Yuanhan I have execute the PVP and loopback performance test on my Ivy bridge serve= r.=20 OS:Ubutnu16.04 CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz Kernal: 4.4.0 gcc : 5.4.0 I use MAC forward for test. Performance base is commit f5472703c0bdfc29c46fc4b2ca445bce3dc08c9f,=20 "eal: optimize aligned memcpy on x86". I can see big performance drop on Mergeable and no-mergeable path after apply this patch=20 Mergebale Path loopback test =09 packet size Performance compare=09 64 -21.76% 128 -17.79% 260 -20.25% 520 -14.80% 1024 -9.34% 1500 -6.16% No-mergeable Path loopback test=09 packet size =09 64 -13.72% 128 -10.35% 260 -16.40% 520 -14.78% 1024 -10.48% 1500 -6.91% Mergeable Path PVP test =09 packet size Performance compare =09 64 -16.33% No-mergeable Path PVP test =09 packet size =09 64 -8.69% Best Regards Lei > Cheers, > Maxime