From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <maxime.coquelin@redhat.com>
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by dpdk.org (Postfix) with ESMTP id 9128C2B8C
 for <dev@dpdk.org>; Thu,  9 Mar 2017 15:38:33 +0100 (CET)
Received: from int-mx14.intmail.prod.int.phx2.redhat.com
 (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id EA87181253;
 Thu,  9 Mar 2017 14:38:33 +0000 (UTC)
Received: from [10.36.118.4] (ovpn-118-4.ams2.redhat.com [10.36.118.4] (may be
 forged))
 by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id
 v29EcRW6015118
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
 Thu, 9 Mar 2017 09:38:30 -0500
To: "Yao, Lei A" <lei.a.yao@intel.com>,
 Yuanhan Liu <yuanhan.liu@linux.intel.com>
References: <20170221173243.20779-1-maxime.coquelin@redhat.com>
 <20170222013734.GJ18844@yliu-dev.sh.intel.com>
 <024ad979-8b54-ac33-54b4-5f8753b74d75@redhat.com>
 <20170223054954.GU18844@yliu-dev.sh.intel.com>
 <349f9a71-7407-e45a-4687-a54fe7e778c8@redhat.com>
 <20170306084649.GH18844@yliu-dev.sh.intel.com>
 <3fa785c1-7a7b-b13e-4bd0-ee34ed5985fe@redhat.com>
 <2DBBFF226F7CF64BAFCA79B681719D953A15F40C@shsmsx102.ccr.corp.intel.com>
Cc: "Liang, Cunming" <cunming.liang@intel.com>,
 "Tan, Jianfeng" <jianfeng.tan@intel.com>, "dev@dpdk.org" <dev@dpdk.org>,
 "Wang, Zhihong" <zhihong.wang@intel.com>
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Message-ID: <10540602-9947-5c19-97f3-eeede49dde27@redhat.com>
Date: Thu, 9 Mar 2017 15:38:26 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.6.0
MIME-Version: 1.0
In-Reply-To: <2DBBFF226F7CF64BAFCA79B681719D953A15F40C@shsmsx102.ccr.corp.intel.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.5.110.25]); Thu, 09 Mar 2017 14:38:34 +0000 (UTC)
Subject: Re: [dpdk-dev] [RFC PATCH] net/virtio: Align Virtio-net header on
 cache line in receive path
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Mar 2017 14:38:33 -0000


On 03/08/2017 07:01 AM, Yao, Lei A wrote:
>
>
>> -----Original Message-----
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Monday, March 6, 2017 10:11 PM
>> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> Cc: Liang, Cunming <cunming.liang@intel.com>; Tan, Jianfeng
>> <jianfeng.tan@intel.com>; dev@dpdk.org; Wang, Zhihong
>> <zhihong.wang@intel.com>; Yao, Lei A <lei.a.yao@intel.com>
>> Subject: Re: [RFC PATCH] net/virtio: Align Virtio-net header on cache line in
>> receive path
>>
>>
>>
>> On 03/06/2017 09:46 AM, Yuanhan Liu wrote:
>>> On Wed, Mar 01, 2017 at 08:36:24AM +0100, Maxime Coquelin wrote:
>>>>
>>>>
>>>> On 02/23/2017 06:49 AM, Yuanhan Liu wrote:
>>>>> On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote:
>>>>>>
>>>>>>
>>>>>> On 02/22/2017 02:37 AM, Yuanhan Liu wrote:
>>>>>>> On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote:
>>>>>>>> This patch aligns the Virtio-net header on a cache-line boundary to
>>>>>>>> optimize cache utilization, as it puts the Virtio-net header (which
>>>>>>>> is always accessed) on the same cache line as the packet header.
>>>>>>>>
>>>>>>>> For example with an application that forwards packets at L2 level,
>>>>>>>> a single cache-line will be accessed with this patch, instead of
>>>>>>>> two before.
>>>>>>>
>>>>>>> I'm assuming you were testing pkt size <= (64 - hdr_size)?
>>>>>>
>>>>>> No, I tested with 64 bytes packets only.
>>>>>
>>>>> Oh, my bad, I overlooked it. While you were saying "a single cache
>>>>> line", I was thinking putting the virtio net hdr and the "whole"
>>>>> packet data in single cache line, which is not possible for pkt
>>>>> size 64B.
>>>>>
>>>>>> I run some more tests this morning with different packet sizes,
>>>>>> and also with changing the mbuf size on guest side to have multi-
>>>>>> buffers packets:
>>>>>>
>>>>>> +-------+--------+--------+-------------------------+
>>>>>> | Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align |
>>>>>> +-------+--------+--------+-------------------------+
>>>>>> |    64 |   2048 |  11.05 |                   11.78 |
>>>>>> |   128 |   2048 |  10.66 |                   11.48 |
>>>>>> |   256 |   2048 |  10.47 |                   11.21 |
>>>>>> |   512 |   2048 |  10.22 |                   10.88 |
>>>>>> |  1024 |   2048 |   7.65 |                    7.84 |
>>>>>> |  1500 |   2048 |   6.25 |                    6.45 |
>>>>>> |  2000 |   2048 |   5.31 |                    5.43 |
>>>>>> |  2048 |   2048 |   5.32 |                    4.25 |
>>>>>> |  1500 |    512 |   3.89 |                    3.98 |
>>>>>> |  2048 |    512 |   1.96 |                    2.02 |
>>>>>> +-------+--------+--------+-------------------------+
>>>>>
>>>>> Could you share more info, say is it a PVP test? Is mergeable on?
>>>>> What's the fwd mode?
>>>>
>>>> No, this is not PVP benchmark, I have neither another server nor a packet
>>>> generator connected to my Haswell machine back-to-back.
>>>>
>>>> This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in
>>>> rxonly. In this configuration, mergeable is ON and no offload disabled
>>>> in QEMU cmdline.
>>>
>>> Okay, I see. So the boost, as you have stated, comes from saving two
>>> cache line access to one. Before that, vhost write 2 cache lines,
>>> while the virtio pmd reads 2 cache lines: one for reading the header,
>>> another one for reading the ether header, for updating xstats (there
>>> is no ether access in the fwd mode you tested).
>>>
>>>> That's why I would be interested in more testing on recent hardware
>>>> with PVP benchmark. Is it something that could be run in Intel lab?
>>>
>>> I think Yao Lei could help on that? But as stated, I think it may
>>> break the performance for bit packets. And I also won't expect big
>>> boost even for 64B in PVP test, judging that it's only 6% boost in
>>> micro bechmarking.
>> That would be great.
>> Note that on SandyBridge, on which I see a drop in perf with
>> microbenchmark, I get a 4% gain on PVP benchmark. So on recent hardware
>> that show a gain on microbenchmark, I'm curious of the gain with PVP
>> bench.
>>
> Hi, Maxime, Yuanhan
>
> I have execute the PVP and loopback performance test on my Ivy bridge server.
> OS:Ubutnu16.04
> CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> Kernal:  4.4.0
> gcc : 5.4.0
> I use MAC forward for test.
>
> Performance base is commit f5472703c0bdfc29c46fc4b2ca445bce3dc08c9f,
> "eal: optimize aligned memcpy on x86".
> I can see big performance drop on Mergeable and no-mergeable path
> after apply this patch
> Mergebale Path loopback test			
> packet size		Performance compare	
> 64		 		-21.76%
> 128				-17.79%
> 260				-20.25%
> 520				-14.80%
> 1024				-9.34%
> 1500				-6.16%
>
> No-mergeable  Path loopback test	
> packet size			
> 64				-13.72%
> 128				-10.35%
> 260				-16.40%
> 520				-14.78%
> 1024				-10.48%
> 1500				-6.91%
>
> Mergeable Path PVP test			
> packet size		Performance compare		
> 64	                                               -16.33%
>
> No-mergeable Path PVP test			
> packet size			
> 64		                               -8.69%

Thanks Yao for the testing.
I'm surprised of the PVP results as even on SandyBridge, where I get
perf drop on micro benchmarks, I get improvement with PVP.

I'll try to reproduce some tests with Ivy Bridge, to understand what is
happening.

Cheers,
Maxime