From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 9128C2B8C for ; Thu, 9 Mar 2017 15:38:33 +0100 (CET) Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EA87181253; Thu, 9 Mar 2017 14:38:33 +0000 (UTC) Received: from [10.36.118.4] (ovpn-118-4.ams2.redhat.com [10.36.118.4] (may be forged)) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v29EcRW6015118 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 9 Mar 2017 09:38:30 -0500 To: "Yao, Lei A" , Yuanhan Liu References: <20170221173243.20779-1-maxime.coquelin@redhat.com> <20170222013734.GJ18844@yliu-dev.sh.intel.com> <024ad979-8b54-ac33-54b4-5f8753b74d75@redhat.com> <20170223054954.GU18844@yliu-dev.sh.intel.com> <349f9a71-7407-e45a-4687-a54fe7e778c8@redhat.com> <20170306084649.GH18844@yliu-dev.sh.intel.com> <3fa785c1-7a7b-b13e-4bd0-ee34ed5985fe@redhat.com> <2DBBFF226F7CF64BAFCA79B681719D953A15F40C@shsmsx102.ccr.corp.intel.com> Cc: "Liang, Cunming" , "Tan, Jianfeng" , "dev@dpdk.org" , "Wang, Zhihong" From: Maxime Coquelin Message-ID: <10540602-9947-5c19-97f3-eeede49dde27@redhat.com> Date: Thu, 9 Mar 2017 15:38:26 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <2DBBFF226F7CF64BAFCA79B681719D953A15F40C@shsmsx102.ccr.corp.intel.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Thu, 09 Mar 2017 14:38:34 +0000 (UTC) Subject: Re: [dpdk-dev] [RFC PATCH] net/virtio: Align Virtio-net header on cache line in receive path X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Mar 2017 14:38:33 -0000 On 03/08/2017 07:01 AM, Yao, Lei A wrote: > > >> -----Original Message----- >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] >> Sent: Monday, March 6, 2017 10:11 PM >> To: Yuanhan Liu >> Cc: Liang, Cunming ; Tan, Jianfeng >> ; dev@dpdk.org; Wang, Zhihong >> ; Yao, Lei A >> Subject: Re: [RFC PATCH] net/virtio: Align Virtio-net header on cache line in >> receive path >> >> >> >> On 03/06/2017 09:46 AM, Yuanhan Liu wrote: >>> On Wed, Mar 01, 2017 at 08:36:24AM +0100, Maxime Coquelin wrote: >>>> >>>> >>>> On 02/23/2017 06:49 AM, Yuanhan Liu wrote: >>>>> On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote: >>>>>> >>>>>> >>>>>> On 02/22/2017 02:37 AM, Yuanhan Liu wrote: >>>>>>> On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote: >>>>>>>> This patch aligns the Virtio-net header on a cache-line boundary to >>>>>>>> optimize cache utilization, as it puts the Virtio-net header (which >>>>>>>> is always accessed) on the same cache line as the packet header. >>>>>>>> >>>>>>>> For example with an application that forwards packets at L2 level, >>>>>>>> a single cache-line will be accessed with this patch, instead of >>>>>>>> two before. >>>>>>> >>>>>>> I'm assuming you were testing pkt size <= (64 - hdr_size)? >>>>>> >>>>>> No, I tested with 64 bytes packets only. >>>>> >>>>> Oh, my bad, I overlooked it. While you were saying "a single cache >>>>> line", I was thinking putting the virtio net hdr and the "whole" >>>>> packet data in single cache line, which is not possible for pkt >>>>> size 64B. >>>>> >>>>>> I run some more tests this morning with different packet sizes, >>>>>> and also with changing the mbuf size on guest side to have multi- >>>>>> buffers packets: >>>>>> >>>>>> +-------+--------+--------+-------------------------+ >>>>>> | Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align | >>>>>> +-------+--------+--------+-------------------------+ >>>>>> | 64 | 2048 | 11.05 | 11.78 | >>>>>> | 128 | 2048 | 10.66 | 11.48 | >>>>>> | 256 | 2048 | 10.47 | 11.21 | >>>>>> | 512 | 2048 | 10.22 | 10.88 | >>>>>> | 1024 | 2048 | 7.65 | 7.84 | >>>>>> | 1500 | 2048 | 6.25 | 6.45 | >>>>>> | 2000 | 2048 | 5.31 | 5.43 | >>>>>> | 2048 | 2048 | 5.32 | 4.25 | >>>>>> | 1500 | 512 | 3.89 | 3.98 | >>>>>> | 2048 | 512 | 1.96 | 2.02 | >>>>>> +-------+--------+--------+-------------------------+ >>>>> >>>>> Could you share more info, say is it a PVP test? Is mergeable on? >>>>> What's the fwd mode? >>>> >>>> No, this is not PVP benchmark, I have neither another server nor a packet >>>> generator connected to my Haswell machine back-to-back. >>>> >>>> This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in >>>> rxonly. In this configuration, mergeable is ON and no offload disabled >>>> in QEMU cmdline. >>> >>> Okay, I see. So the boost, as you have stated, comes from saving two >>> cache line access to one. Before that, vhost write 2 cache lines, >>> while the virtio pmd reads 2 cache lines: one for reading the header, >>> another one for reading the ether header, for updating xstats (there >>> is no ether access in the fwd mode you tested). >>> >>>> That's why I would be interested in more testing on recent hardware >>>> with PVP benchmark. Is it something that could be run in Intel lab? >>> >>> I think Yao Lei could help on that? But as stated, I think it may >>> break the performance for bit packets. And I also won't expect big >>> boost even for 64B in PVP test, judging that it's only 6% boost in >>> micro bechmarking. >> That would be great. >> Note that on SandyBridge, on which I see a drop in perf with >> microbenchmark, I get a 4% gain on PVP benchmark. So on recent hardware >> that show a gain on microbenchmark, I'm curious of the gain with PVP >> bench. >> > Hi, Maxime, Yuanhan > > I have execute the PVP and loopback performance test on my Ivy bridge server. > OS:Ubutnu16.04 > CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz > Kernal: 4.4.0 > gcc : 5.4.0 > I use MAC forward for test. > > Performance base is commit f5472703c0bdfc29c46fc4b2ca445bce3dc08c9f, > "eal: optimize aligned memcpy on x86". > I can see big performance drop on Mergeable and no-mergeable path > after apply this patch > Mergebale Path loopback test > packet size Performance compare > 64 -21.76% > 128 -17.79% > 260 -20.25% > 520 -14.80% > 1024 -9.34% > 1500 -6.16% > > No-mergeable Path loopback test > packet size > 64 -13.72% > 128 -10.35% > 260 -16.40% > 520 -14.78% > 1024 -10.48% > 1500 -6.91% > > Mergeable Path PVP test > packet size Performance compare > 64 -16.33% > > No-mergeable Path PVP test > packet size > 64 -8.69% Thanks Yao for the testing. I'm surprised of the PVP results as even on SandyBridge, where I get perf drop on micro benchmarks, I get improvement with PVP. I'll try to reproduce some tests with Ivy Bridge, to understand what is happening. Cheers, Maxime