From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id DA640214A for ; Thu, 14 Apr 2016 18:03:20 +0200 (CEST) Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga104.fm.intel.com with ESMTP; 14 Apr 2016 09:03:11 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,485,1455004800"; d="scan'208";a="686296039" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by FMSMGA003.fm.intel.com with ESMTP; 14 Apr 2016 09:03:12 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id u3EG3A1M023862; Thu, 14 Apr 2016 17:03:10 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id u3EG3ALt012175; Thu, 14 Apr 2016 17:03:10 +0100 Received: (from bricha3@localhost) by sivswdev01.ir.intel.com with id u3EG3A2s012171; Thu, 14 Apr 2016 17:03:10 +0100 From: Bruce Richardson To: dev@dpdk.org Cc: Helin Zhang , Jingjing Wu , Bruce Richardson Date: Thu, 14 Apr 2016 17:02:37 +0100 Message-Id: <1460649757-11862-4-git-send-email-bruce.richardson@intel.com> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1460649757-11862-1-git-send-email-bruce.richardson@intel.com> References: <1460628921-25635-1-git-send-email-bruce.richardson@intel.com> <1460649757-11862-1-git-send-email-bruce.richardson@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] [PATCH v2 3/3] i40e: simplify SSE packet length extraction code X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Apr 2016 16:03:21 -0000 In Table 8-16 of the "IntelĀ® Ethernet Controller XL710 Datasheet" it is stated that when the whole packet is written to a single buffer, the header length field in the descriptor will be 0. This means that when extracting the packet/data_len field from the descriptor in the driver we do not need to mask out the extra header-length bits. Inside the vector driver, this reduces the need to pull all four pktlen fields into a single register to work on. Instead of a shift and mask, we now need to only do a shift. Therefore, we can work on each descriptor independently, processing each using one shift intrinsic and a blend. This change makes the code shorter and easier to read, so we can pull it into the main descriptor processing loop instead of needing its own function. This in turn makes the descriptor processing in the loop as a whole slightly easier to read as it's more linear. In terms of performance, in testing this change shows little effect, with single-core perf tests showing a very slight improvement. Signed-off-by: Bruce Richardson --- drivers/net/i40e/i40e_rxtx_vec.c | 51 ++++++++++++++-------------------------- 1 file changed, 17 insertions(+), 34 deletions(-) diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c index 9f67f9d..f7a62a8 100644 --- a/drivers/net/i40e/i40e_rxtx_vec.c +++ b/drivers/net/i40e/i40e_rxtx_vec.c @@ -184,37 +184,7 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts) #define desc_to_olflags_v(desc, rx_pkts) do {} while (0) #endif -#define PKTLEN_SHIFT (6) -#define PKTLEN_MASK (0x3FFF) -/* Handling the pkt len field is not aligned with 1byte, so shift is - * needed to let it align - */ -static inline void -desc_pktlen_align(__m128i descs[4]) -{ - __m128i pktlen0, pktlen1; - - /* mask everything except pktlen field*/ - const __m128i pktlen_msk = _mm_set_epi32(PKTLEN_MASK, PKTLEN_MASK, - PKTLEN_MASK, PKTLEN_MASK); - - pktlen0 = _mm_unpackhi_epi32(descs[0], descs[2]); - pktlen1 = _mm_unpackhi_epi32(descs[1], descs[3]); - pktlen0 = _mm_unpackhi_epi32(pktlen0, pktlen1); - - pktlen0 = _mm_srli_epi32(pktlen0, PKTLEN_SHIFT); - pktlen0 = _mm_and_si128(pktlen0, pktlen_msk); - - pktlen0 = _mm_packs_epi32(pktlen0, pktlen0); - - descs[3] = _mm_blend_epi16(descs[3], pktlen0, 0x80); - pktlen0 = _mm_slli_epi64(pktlen0, 16); - descs[2] = _mm_blend_epi16(descs[2], pktlen0, 0x80); - pktlen0 = _mm_slli_epi64(pktlen0, 16); - descs[1] = _mm_blend_epi16(descs[1], pktlen0, 0x80); - pktlen0 = _mm_slli_epi64(pktlen0, 16); - descs[0] = _mm_blend_epi16(descs[0], pktlen0, 0x80); -} +#define PKTLEN_SHIFT 10 /* * Notice: @@ -333,12 +303,17 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_prefetch0(&rx_pkts[pos + 3]->cacheline1); } - /*shift the pktlen field*/ - desc_pktlen_align(descs); - /* avoid compiler reorder optimization */ rte_compiler_barrier(); + /* pkt 3,4 shift the pktlen field to be 16-bit aligned*/ + const __m128i len3 = _mm_slli_epi32(descs[3], PKTLEN_SHIFT); + const __m128i len2 = _mm_slli_epi32(descs[2], PKTLEN_SHIFT); + + /* merge the now-aligned packet length fields back in */ + descs[3] = _mm_blend_epi16(descs[3], len3, 0x80); + descs[2] = _mm_blend_epi16(descs[2], len2, 0x80); + /* D.1 pkt 3,4 convert format from desc to pktmbuf */ pkt_mb4 = _mm_shuffle_epi8(descs[3], shuf_msk); pkt_mb3 = _mm_shuffle_epi8(descs[2], shuf_msk); @@ -354,6 +329,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, pkt_mb4 = _mm_add_epi16(pkt_mb4, crc_adjust); pkt_mb3 = _mm_add_epi16(pkt_mb3, crc_adjust); + /* pkt 1,2 shift the pktlen field to be 16-bit aligned*/ + const __m128i len1 = _mm_slli_epi32(descs[1], PKTLEN_SHIFT); + const __m128i len0 = _mm_slli_epi32(descs[0], PKTLEN_SHIFT); + + /* merge the now-aligned packet length fields back in */ + descs[1] = _mm_blend_epi16(descs[1], len1, 0x80); + descs[0] = _mm_blend_epi16(descs[0], len0, 0x80); + /* D.1 pkt 1,2 convert format from desc to pktmbuf */ pkt_mb2 = _mm_shuffle_epi8(descs[1], shuf_msk); pkt_mb1 = _mm_shuffle_epi8(descs[0], shuf_msk); -- 2.5.5