DPDK patches and discussions
 help / color / mirror / Atom feed
From: Yuanhan Liu <yuanhan.liu@linux.intel.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: cunming.liang@intel.com, jianfeng.tan@intel.com, dev@dpdk.org
Subject: Re: [dpdk-dev] [RFC PATCH] net/virtio: Align Virtio-net header on cache line in receive path
Date: Wed, 22 Feb 2017 09:37:34 +0800	[thread overview]
Message-ID: <20170222013734.GJ18844@yliu-dev.sh.intel.com> (raw)
In-Reply-To: <20170221173243.20779-1-maxime.coquelin@redhat.com>

On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote:
> This patch aligns the Virtio-net header on a cache-line boundary to
> optimize cache utilization, as it puts the Virtio-net header (which
> is always accessed) on the same cache line as the packet header.
> 
> For example with an application that forwards packets at L2 level,
> a single cache-line will be accessed with this patch, instead of
> two before.

I'm assuming you were testing pkt size <= (64 - hdr_size)?

> In case of multi-buffers packets, next segments will be aligned on
> a cache-line boundary, instead of cache-line boundary minus size of
> vnet header before.

The another thing is, this patch always makes the pkt data cache
unaligned for the first packet, which makes Zhihong's optimization
on memcpy (for big packet) useless.

    commit f5472703c0bdfc29c46fc4b2ca445bce3dc08c9f
    Author: Zhihong Wang <zhihong.wang@intel.com>
    Date:   Tue Dec 6 20:31:06 2016 -0500
    
        eal: optimize aligned memcpy on x86
    
        This patch optimizes rte_memcpy for well aligned cases, where both
        dst and src addr are aligned to maximum MOV width. It introduces a
        dedicated function called rte_memcpy_aligned to handle the aligned
        cases with simplified instruction stream. The existing rte_memcpy
        cases with simplified instruction stream. The existing rte_memcpy
        is renamed as rte_memcpy_generic. The selection between them 2 is
        done at the entry of rte_memcpy.
    
        The existing rte_memcpy is for generic cases, it handles unaligned
        copies and make store aligned, it even makes load aligned for micro
        architectures like Ivy Bridge. However alignment handling comes at
        a price: It adds extra load/store instructions, which can cause
        complications sometime.
    
        DPDK Vhost memcpy with Mergeable Rx Buffer feature as an example:
        The copy is aligned, and remote, and there is header write along
        which is also remote. In this case the memcpy instruction stream
        should be simplified, to reduce extra load/store, therefore reduce
        the probability of load/store buffer full caused pipeline stall, to
        let the actual memcpy instructions be issued and let H/W prefetcher
        goes to work as early as possible.
    
        This patch is tested on Ivy Bridge, Haswell and Skylake, it provides
        up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging
        from 64 to 1500 bytes.
    
        The test can also be conducted without NIC, by setting loopback
        traffic between Virtio and Vhost. For example, modify the macro
        TXONLY_DEF_PACKET_LEN to the requested packet size in testpmd.h,
        rebuild and start testpmd in both host and guest, then "start" on
        one side and "start tx_first 32" on the other.
    
        Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
        Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
        Tested-by: Lei Yao <lei.a.yao@intel.com>
    
> 
> Signed-off-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> ---
> 
> Hi,
> 
> I send this patch as RFC because I get strange results on SandyBridge.
> 
> For micro-benchmarks, I measure a +6% gain on Haswell, but I get a big
> performance drop on SandyBridge (~-18%).
> When running PVP benchmark on SandyBridge, I measure a +4% performance
> gain though.
> 
> So I'd like to call for testing on this patch, especially PVP-like testing
> on newer architectures.
> 
> Regarding SandyBridge, I would be interrested to know whether we should
> take the performance drop into account, as we for example had one patch in
> last release that cause a performance drop on SB we merged anyway.

Sorry, would you remind me which patch it is?

	--yliu

  reply	other threads:[~2017-02-22  1:35 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-21 17:32 Maxime Coquelin
2017-02-22  1:37 ` Yuanhan Liu [this message]
2017-02-22  2:49   ` Yang, Zhiyong
2017-02-22  9:39     ` Maxime Coquelin
2017-02-22  9:36   ` Maxime Coquelin
2017-02-23  5:49     ` Yuanhan Liu
2017-03-01  7:36       ` Maxime Coquelin
2017-03-06  8:46         ` Yuanhan Liu
2017-03-06 14:11           ` Maxime Coquelin
2017-03-08  6:01             ` Yao, Lei A
2017-03-09 14:38               ` Maxime Coquelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170222013734.GJ18844@yliu-dev.sh.intel.com \
    --to=yuanhan.liu@linux.intel.com \
    --cc=cunming.liang@intel.com \
    --cc=dev@dpdk.org \
    --cc=jianfeng.tan@intel.com \
    --cc=maxime.coquelin@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).