From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 0BA791B1B8 for ; Tue, 9 Jan 2018 15:33:40 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 09 Jan 2018 06:33:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.46,335,1511856000"; d="scan'208";a="9359854" Received: from silpixa00399126.ir.intel.com (HELO silpixa00399126.ger.corp.intel.com) ([10.237.223.223]) by orsmga008.jf.intel.com with ESMTP; 09 Jan 2018 06:33:37 -0800 From: Bruce Richardson To: qi.z.zhang@intel.com, beilei.xing@intel.com Cc: dev@dpdk.org, helin.zhang@intel.com, ferruh.yigit@intel.com, Bruce Richardson Date: Tue, 9 Jan 2018 14:32:52 +0000 Message-Id: <20180109143254.234428-1-bruce.richardson@intel.com> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20171123165314.168786-1-bruce.richardson@intel.com> References: <20171123165314.168786-1-bruce.richardson@intel.com> Subject: [dpdk-dev] [PATCH v2 0/2] AVX2 Vectorized Rx/Tx functions for i40e X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Jan 2018 14:33:41 -0000 This patch adds an AVX2 vectorized path to the i40e driver, based on the existing SSE4.2 version. Using AVX2 instructions gives better performance than the SSE version, though the percentage increase depends on the exact settings used. For example: * Using 16B rather than 32B descriptors gives the biggest benefit since 2 descriptors at a time can be read, rather than just 1 when 32B ones are used. * Bigger burst sizes for RX gives improved performance - while we see an improvement with testpmd with the default burst size of 32, burst sizes of up to 128 give further improvements * In my testing, most of the improvement comes from faster processing on the RX path, though the improved TX also gives benefit. This has been tested on a system with CPU: "Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz", and I've focused on testing with Rx ring sizes of approx 1k - generally --rxd=1024 and --txd=512, rather than the defaults which tend to give poorer zero-loss performance due to the smaller amount of buffering. V2: * Fixed incorrect config variable reference in makefile * Added missing stub function for when vector drivers are disabled * Added missing references to the new functions when checking for vector code paths, e.g. for ring tear-down Bruce Richardson (2): net/i40e: add AVX2 Tx function net/i40e: add AVX2 Rx function drivers/net/i40e/Makefile | 19 + drivers/net/i40e/i40e_rxtx.c | 66 ++- drivers/net/i40e/i40e_rxtx.h | 6 + drivers/net/i40e/i40e_rxtx_vec_avx2.c | 792 ++++++++++++++++++++++++++++++++++ 4 files changed, 880 insertions(+), 3 deletions(-) create mode 100644 drivers/net/i40e/i40e_rxtx_vec_avx2.c -- 2.14.3