From: Timothy McDaniel <timothy.mcdaniel@intel.com>
To: jerinj@marvell.com
Cc: dev@dpdk.org
Subject: [PATCH] event/dlb2: add support for single 512B write of 4 QEs
Date: Sat, 9 Apr 2022 10:18:49 -0500 [thread overview]
Message-ID: <20220409151849.1007602-1-timothy.mcdaniel@intel.com> (raw)
On Xeon, as 512b accesses are available, movdir64 instruction is able to
perform 512b read and write to DLB producer port. In order for movdir64
to be able to pull its data from store buffers (store-buffer-forwarding)
(before actual write), data should be in single 512b write format.
This commit add change when code is built for Xeon with 512b AVX support
to make single 512b write of all 4 QEs instead of 4x64b writes.
Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---
drivers/event/dlb2/dlb2.c | 86 ++++++++++++++++++++++++++++++---------
1 file changed, 67 insertions(+), 19 deletions(-)
diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index 36f07d0061..e2a5303310 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -2776,25 +2776,73 @@ dlb2_event_build_hcws(struct dlb2_port *qm_port,
ev[3].event_type,
DLB2_QE_EV_TYPE_WORD + 4);
- /* Store the metadata to memory (use the double-precision
- * _mm_storeh_pd because there is no integer function for
- * storing the upper 64b):
- * qe[0] metadata = sse_qe[0][63:0]
- * qe[1] metadata = sse_qe[0][127:64]
- * qe[2] metadata = sse_qe[1][63:0]
- * qe[3] metadata = sse_qe[1][127:64]
- */
- _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, sse_qe[0]);
- _mm_storeh_pd((double *)&qe[1].u.opaque_data,
- (__m128d)sse_qe[0]);
- _mm_storel_epi64((__m128i *)&qe[2].u.opaque_data, sse_qe[1]);
- _mm_storeh_pd((double *)&qe[3].u.opaque_data,
- (__m128d)sse_qe[1]);
-
- qe[0].data = ev[0].u64;
- qe[1].data = ev[1].u64;
- qe[2].data = ev[2].u64;
- qe[3].data = ev[3].u64;
+ #ifdef __AVX512VL__
+
+ /*
+ * 1) Build avx512 QE store and build each
+ * QE individually as XMM register
+ * 2) Merge the 4 XMM registers/QEs into single AVX512
+ * register
+ * 3) Store single avx512 register to &qe[0] (4x QEs
+ * stored in 1x store)
+ */
+
+ __m128i v_qe0 = _mm_setzero_si128();
+ uint64_t meta = _mm_extract_epi64(sse_qe[0], 0);
+ v_qe0 = _mm_insert_epi64(v_qe0, ev[0].u64, 0);
+ v_qe0 = _mm_insert_epi64(v_qe0, meta, 1);
+
+ __m128i v_qe1 = _mm_setzero_si128();
+ meta = _mm_extract_epi64(sse_qe[0], 1);
+ v_qe1 = _mm_insert_epi64(v_qe1, ev[1].u64, 0);
+ v_qe1 = _mm_insert_epi64(v_qe1, meta, 1);
+
+ __m128i v_qe2 = _mm_setzero_si128();
+ meta = _mm_extract_epi64(sse_qe[1], 0);
+ v_qe2 = _mm_insert_epi64(v_qe2, ev[2].u64, 0);
+ v_qe2 = _mm_insert_epi64(v_qe2, meta, 1);
+
+ __m128i v_qe3 = _mm_setzero_si128();
+ meta = _mm_extract_epi64(sse_qe[1], 1);
+ v_qe3 = _mm_insert_epi64(v_qe3, ev[3].u64, 0);
+ v_qe3 = _mm_insert_epi64(v_qe3, meta, 1);
+
+ /* we have 4x XMM registers, one per QE. */
+ __m512i v_all_qes = _mm512_setzero_si512();
+ v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe0, 0);
+ v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe1, 1);
+ v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe2, 2);
+ v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe3, 3);
+
+ /*
+ * store the 4x QEs in a single register to the scratch
+ * space of the PMD
+ */
+ _mm512_store_si512(&qe[0], v_all_qes);
+#else
+ /*
+ * Store the metadata to memory (use the double-precision
+ * _mm_storeh_pd because there is no integer function for
+ * storing the upper 64b):
+ * qe[0] metadata = sse_qe[0][63:0]
+ * qe[1] metadata = sse_qe[0][127:64]
+ * qe[2] metadata = sse_qe[1][63:0]
+ * qe[3] metadata = sse_qe[1][127:64]
+ */
+ _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data,
+ sse_qe[0]);
+ _mm_storeh_pd((double *)&qe[1].u.opaque_data,
+ (__m128d)sse_qe[0]);
+ _mm_storel_epi64((__m128i *)&qe[2].u.opaque_data,
+ sse_qe[1]);
+ _mm_storeh_pd((double *)&qe[3].u.opaque_data,
+ (__m128d)sse_qe[1]);
+
+ qe[0].data = ev[0].u64;
+ qe[1].data = ev[1].u64;
+ qe[2].data = ev[2].u64;
+ qe[3].data = ev[3].u64;
+#endif
break;
case 3:
--
2.25.1
next reply other threads:[~2022-04-09 15:18 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-09 15:18 Timothy McDaniel [this message]
2022-05-14 12:07 ` Jerin Jacob
2022-05-16 8:42 ` Bruce Richardson
2022-05-16 17:00 ` McDaniel, Timothy
2022-05-19 20:24 ` [PATCH v3] " Timothy McDaniel
2022-05-23 16:09 ` [PATCH v4] " Timothy McDaniel
2022-05-23 16:34 ` Bruce Richardson
2022-05-23 16:52 ` McDaniel, Timothy
2022-05-23 16:55 ` Bruce Richardson
2022-06-09 17:40 ` Jerin Jacob
2022-06-09 18:02 ` McDaniel, Timothy
2022-05-23 16:37 ` Bruce Richardson
2022-05-23 16:45 ` McDaniel, Timothy
2022-06-10 12:43 ` [PATCH v6] " Timothy McDaniel
2022-06-10 15:41 ` [PATCH v7] " Timothy McDaniel
2022-06-10 16:15 ` Bruce Richardson
2022-06-10 16:27 ` [PATCH v8] " Timothy McDaniel
2022-06-13 6:30 ` Jerin Jacob
2022-06-13 20:39 ` [PATCH v9] " Timothy McDaniel
2022-06-14 10:40 ` Jerin Jacob
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220409151849.1007602-1-timothy.mcdaniel@intel.com \
--to=timothy.mcdaniel@intel.com \
--cc=dev@dpdk.org \
--cc=jerinj@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).