From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 76E6F8E68 for ; Tue, 19 Jan 2016 03:37:50 +0100 (CET) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga101.fm.intel.com with ESMTP; 18 Jan 2016 18:37:49 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,315,1449561600"; d="scan'208";a="635790772" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by FMSMGA003.fm.intel.com with ESMTP; 18 Jan 2016 18:37:49 -0800 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 18 Jan 2016 18:37:49 -0800 Received: from shsmsx103.ccr.corp.intel.com ([169.254.4.218]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.209]) with mapi id 14.03.0248.002; Tue, 19 Jan 2016 10:37:47 +0800 From: "Wang, Zhihong" To: Stephen Hemminger Thread-Topic: [PATCH v2 0/5] Optimize memcpy for AVX512 platforms Thread-Index: AQHRUdg/X+58T6SoTkGuqJLe04pQk58BLaKAgADdbIA= Date: Tue, 19 Jan 2016 02:37:46 +0000 Message-ID: <8F6C2BD409508844A0EFC19955BE0941033A8CF5@SHSMSX103.ccr.corp.intel.com> References: <1452752002-107586-1-git-send-email-zhihong.wang@intel.com> <1453086314-30158-1-git-send-email-zhihong.wang@intel.com> <20160118120629.5ed7bcd9@xeon-e3> In-Reply-To: <20160118120629.5ed7bcd9@xeon-e3> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_IC x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYmE0ZmE1ODEtNTczZS00ZDRkLTk3MWEtYzg1YTNlMWFiMmM3IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjQuMTAuMTkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiMkYrMGkwY2dKRjhQblFudUphU01jck5tQTBMdGtxZnFqQjhsdllsS1BFaz0ifQ== x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [PATCH v2 0/5] Optimize memcpy for AVX512 platforms X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Jan 2016 02:37:50 -0000 > -----Original Message----- > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > Sent: Tuesday, January 19, 2016 4:06 AM > To: Wang, Zhihong > Cc: dev@dpdk.org; Ananyev, Konstantin ; > Richardson, Bruce ; Xie, Huawei > > Subject: Re: [PATCH v2 0/5] Optimize memcpy for AVX512 platforms >=20 > On Sun, 17 Jan 2016 22:05:09 -0500 > Zhihong Wang wrote: >=20 > > This patch set optimizes DPDK memcpy for AVX512 platforms, to make full > > utilization of hardware resources and deliver high performance. > > > > In current DPDK, memcpy holds a large proportion of execution time in > > libs like Vhost, especially for large packets, and this patch can bring > > considerable benefits. > > > > The implementation is based on the current DPDK memcpy framework, some > > background introduction can be found in these threads: > > http://dpdk.org/ml/archives/dev/2014-November/008158.html > > http://dpdk.org/ml/archives/dev/2015-January/011800.html > > > > Code changes are: > > > > 1. Read CPUID to check if AVX512 is supported by CPU > > > > 2. Predefine AVX512 macro if AVX512 is enabled by compiler > > > > 3. Implement AVX512 memcpy and choose the right implementation based > on > > predefined macros > > > > 4. Decide alignment unit for memcpy perf test based on predefined mac= ros >=20 > Cool, I like it. How much impact does this have on VHOST? The impact is significant especially for enqueue (Detailed numbers might no= t be appropriate here due to policy :-), only how I test it), because VHOST a= ctually spends a lot of time doing memcpy. Simply measure 1024B RX/TX time cost and compare it with 64B's and you'll get a sense of it, although not precise. My test cases include NIC2VM2NIC and VM2VM scenarios, which are the main use cases currently, and use both throughput and RX/TX cycles for evaluatio= n.