DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: dev@dpdk.org
Cc: Helin Zhang <helin.zhang@intel.com>,
	Jingjing Wu <jingjing.wu@intel.com>,
	 Bruce Richardson <bruce.richardson@intel.com>
Subject: [dpdk-dev] [PATCH v2 2/3] i40e: improve performance of vector PMD
Date: Thu, 14 Apr 2016 17:02:36 +0100	[thread overview]
Message-ID: <1460649757-11862-3-git-send-email-bruce.richardson@intel.com> (raw)
In-Reply-To: <1460649757-11862-1-git-send-email-bruce.richardson@intel.com>

An analysis of the i40e code using Intel® VTune™ Amplifier 2016 showed
that the code was unexpectedly causing stalls due to "Loads blocked by
Store Forwards". This can occur when a load from memory has to wait
due to the prior store being to the same address, but being of a smaller
size i.e. the stored value cannot be directly returned to the loader.
[See ref: https://software.intel.com/en-us/node/544454]

These stalls are due to the way in which the data_len values are handled
in the driver. The lengths are extracted using vector operations, but those
16-bit lengths are then assigned using scalar operations i.e. 16-bit
stores.

These regular 16-bit stores actually have two effects in the code:
* they cause the "Loads blocked by Store Forwards" issues reported
* they also cause the previous loads in the RX function to actually be a
load followed by a store to an address on the stack, because the 16-bit
assignment can't be done to an xmm register.

By converting the 16-bit store operations into a sequence of SSE blend
operations, we can ensure that the descriptor loads only occur once, and
avoid both the additional stores and loads from the stack, as well as the
stalls due to the blocked loads.

Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
---
 drivers/net/i40e/i40e_rxtx_vec.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx_vec.c b/drivers/net/i40e/i40e_rxtx_vec.c
index 1e2fadd..9f67f9d 100644
--- a/drivers/net/i40e/i40e_rxtx_vec.c
+++ b/drivers/net/i40e/i40e_rxtx_vec.c
@@ -192,11 +192,7 @@ desc_to_olflags_v(__m128i descs[4], struct rte_mbuf **rx_pkts)
 static inline void
 desc_pktlen_align(__m128i descs[4])
 {
-	__m128i pktlen0, pktlen1, zero;
-	union {
-		uint16_t e[4];
-		uint64_t dword;
-	} vol;
+	__m128i pktlen0, pktlen1;
 
 	/* mask everything except pktlen field*/
 	const __m128i pktlen_msk = _mm_set_epi32(PKTLEN_MASK, PKTLEN_MASK,
@@ -206,18 +202,18 @@ desc_pktlen_align(__m128i descs[4])
 	pktlen1 = _mm_unpackhi_epi32(descs[1], descs[3]);
 	pktlen0 = _mm_unpackhi_epi32(pktlen0, pktlen1);
 
-	zero = _mm_xor_si128(pktlen0, pktlen0);
-
 	pktlen0 = _mm_srli_epi32(pktlen0, PKTLEN_SHIFT);
 	pktlen0 = _mm_and_si128(pktlen0, pktlen_msk);
 
-	pktlen0 = _mm_packs_epi32(pktlen0, zero);
-	vol.dword = _mm_cvtsi128_si64(pktlen0);
-	/* let the descriptor byte 15-14 store the pkt len */
-	*((uint16_t *)&descs[0]+7) = vol.e[0];
-	*((uint16_t *)&descs[1]+7) = vol.e[1];
-	*((uint16_t *)&descs[2]+7) = vol.e[2];
-	*((uint16_t *)&descs[3]+7) = vol.e[3];
+	pktlen0 = _mm_packs_epi32(pktlen0, pktlen0);
+
+	descs[3] = _mm_blend_epi16(descs[3], pktlen0, 0x80);
+	pktlen0 = _mm_slli_epi64(pktlen0, 16);
+	descs[2] = _mm_blend_epi16(descs[2], pktlen0, 0x80);
+	pktlen0 = _mm_slli_epi64(pktlen0, 16);
+	descs[1] = _mm_blend_epi16(descs[1], pktlen0, 0x80);
+	pktlen0 = _mm_slli_epi64(pktlen0, 16);
+	descs[0] = _mm_blend_epi16(descs[0], pktlen0, 0x80);
 }
 
  /*
-- 
2.5.5

  parent reply	other threads:[~2016-04-14 16:03 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-14 10:15 [dpdk-dev] [PATCH] " Bruce Richardson
2016-04-14 13:50 ` Bruce Richardson
2016-04-14 14:00   ` Ananyev, Konstantin
2016-04-14 15:33     ` Iremonger, Bernard
2016-04-14 16:02 ` [dpdk-dev] [PATCH v2 0/3] improve i40e vpmd Bruce Richardson
2016-04-14 16:02   ` [dpdk-dev] [PATCH v2 1/3] i40e: require SSE4.1 support for vector driver Bruce Richardson
2016-04-14 16:02   ` Bruce Richardson [this message]
2016-04-14 16:02   ` [dpdk-dev] [PATCH v2 3/3] i40e: simplify SSE packet length extraction code Bruce Richardson
2016-04-17  8:32   ` [dpdk-dev] [PATCH v2 0/3] improve i40e vpmd Zhe Tao
2016-04-27 16:30     ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1460649757-11862-3-git-send-email-bruce.richardson@intel.com \
    --to=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=helin.zhang@intel.com \
    --cc=jingjing.wu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).