From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f47.google.com (mail-wm0-f47.google.com [74.125.82.47]) by dpdk.org (Postfix) with ESMTP id AECB81094 for ; Tue, 17 Jan 2017 16:08:44 +0100 (CET) Received: by mail-wm0-f47.google.com with SMTP id c85so203995352wmi.1 for ; Tue, 17 Jan 2017 07:08:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:user-agent:in-reply-to :references:mime-version:content-transfer-encoding; bh=h5Z1MOHLA4sPJLWfvNItKZZFOuPYbR0GLC0ED+tHt6w=; b=vGopoPMt02l5kiEAUXnAVaK3mYyr3IkjGdXp4DA70ev3/ArZWg5nUrO+vep0eQKwGl /EilyF8Cf8SWbZVKe33gMzNQuc8QNl56wvuK0gUmb9f7fl2BpZB3W18V/qgFXgoks/Ik J8/wlKxYh8XysPqkE++tCwrth7m5tu44SY4m5caV+Cxxe07lA/aecHktefoaVnC2LlSY w4B+WmQlYoZQ9PSsx+D6G1tFwLUaCLuY2k4jcAipk16AoPrkhN75UXmtjodK9m/2Pzta xKwmpaI/EeuwKw726n8kQDvGSp8QtTU9QxKyRhSEiffpYMHCjHTyIyI91g4w83oCbAMk pQbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:user-agent :in-reply-to:references:mime-version:content-transfer-encoding; bh=h5Z1MOHLA4sPJLWfvNItKZZFOuPYbR0GLC0ED+tHt6w=; b=RgbiOP2rPpMJYqgwwqirBcwYNaDw6aGl3G09YMO/MskaEjlIzd72BFamr7n4S5p4T1 QA+AYkf4pg5AfUR78xTIb5/xhh+e4XMO1uArpL0AjBLAcolVexVSnCni78DC8ujS1MnE 0jyXntL4oA93S9gunJsqfXGdM0FsxkMMkrXa+eSVMBP53Q9ZrrV6QPgLp6ZgJX65VagH DW554/nG6K0KNskQs5xgabZEek+y+Sf2gK0JNHQiPUKt0rjpEh9Rj4PxAiFGzuBZFg/q nHpGD2wHPkmyq2fDxEzem/YSohutoE3O7fSSk5DQ+8bba3l9fWHpxFK+pnMDRpf2oa77 5zGw== X-Gm-Message-State: AIkVDXJitGSd+DSXsv8hiw3G4R1ytlf0SHd3zA5TBKRlisSY8l26m+R8ODQmxuM0rj+Zv/PY X-Received: by 10.223.173.181 with SMTP id w50mr27401154wrc.177.1484665724448; Tue, 17 Jan 2017 07:08:44 -0800 (PST) Received: from xps13.localnet (184.203.134.77.rev.sfr.net. [77.134.203.184]) by smtp.gmail.com with ESMTPSA id r129sm37774623wmd.8.2017.01.17.07.08.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 17 Jan 2017 07:08:43 -0800 (PST) From: Thomas Monjalon To: Zhihong Wang Cc: Yuanhan Liu , dev@dpdk.org, lei.a.yao@intel.com Date: Tue, 17 Jan 2017 16:08:42 +0100 Message-ID: <1597948.LxUmgnGZos@xps13> User-Agent: KMail/4.14.10 (Linux/4.5.4-1-ARCH; KDE/4.14.11; x86_64; ; ) In-Reply-To: <20161208021843.GM31182@yliu-dev.sh.intel.com> References: <1480641582-56186-1-git-send-email-zhihong.wang@intel.com> <1481074266-4461-1-git-send-email-zhihong.wang@intel.com> <20161208021843.GM31182@yliu-dev.sh.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v2] eal: optimize aligned rte_memcpy X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 17 Jan 2017 15:08:44 -0000 2016-12-08 10:18, Yuanhan Liu: > On Tue, Dec 06, 2016 at 08:31:06PM -0500, Zhihong Wang wrote: > > This patch optimizes rte_memcpy for well aligned cases, where both > > dst and src addr are aligned to maximum MOV width. It introduces a > > dedicated function called rte_memcpy_aligned to handle the aligned > > cases with simplified instruction stream. The existing rte_memcpy > > is renamed as rte_memcpy_generic. The selection between them 2 is > > done at the entry of rte_memcpy. > > > > The existing rte_memcpy is for generic cases, it handles unaligned > > copies and make store aligned, it even makes load aligned for micro > > architectures like Ivy Bridge. However alignment handling comes at > > a price: It adds extra load/store instructions, which can cause > > complications sometime. > > > > DPDK Vhost memcpy with Mergeable Rx Buffer feature as an example: > > The copy is aligned, and remote, and there is header write along > > which is also remote. In this case the memcpy instruction stream > > should be simplified, to reduce extra load/store, therefore reduce > > the probability of load/store buffer full caused pipeline stall, to > > let the actual memcpy instructions be issued and let H/W prefetcher > > goes to work as early as possible. > > > > This patch is tested on Ivy Bridge, Haswell and Skylake, it provides > > up to 20% gain for Virtio Vhost PVP traffic, with packet size ranging > > from 64 to 1500 bytes. > > > > The test can also be conducted without NIC, by setting loopback > > traffic between Virtio and Vhost. For example, modify the macro > > TXONLY_DEF_PACKET_LEN to the requested packet size in testpmd.h, > > rebuild and start testpmd in both host and guest, then "start" on > > one side and "start tx_first 32" on the other. > > > > > > Signed-off-by: Zhihong Wang > > Reviewed-by: Yuanhan Liu Applied, thanks