DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH] event/dlb2: add support for single 512B write of 4 QEs
@ 2022-04-09 15:18 Timothy McDaniel
  2022-05-14 12:07 ` Jerin Jacob
                   ` (6 more replies)
  0 siblings, 7 replies; 20+ messages in thread
From: Timothy McDaniel @ 2022-04-09 15:18 UTC (permalink / raw)
  To: jerinj; +Cc: dev

On Xeon, as 512b accesses are available, movdir64 instruction is able to
perform 512b read and write to DLB producer port. In order for movdir64
to be able to pull its data from store buffers (store-buffer-forwarding)
(before actual write), data should be in single 512b write format.
This commit add change when code is built for Xeon with 512b AVX support
to make single 512b write of all 4 QEs instead of 4x64b writes.

Signed-off-by: Timothy McDaniel <timothy.mcdaniel@intel.com>
---
 drivers/event/dlb2/dlb2.c | 86 ++++++++++++++++++++++++++++++---------
 1 file changed, 67 insertions(+), 19 deletions(-)

diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c
index 36f07d0061..e2a5303310 100644
--- a/drivers/event/dlb2/dlb2.c
+++ b/drivers/event/dlb2/dlb2.c
@@ -2776,25 +2776,73 @@ dlb2_event_build_hcws(struct dlb2_port *qm_port,
 						ev[3].event_type,
 					     DLB2_QE_EV_TYPE_WORD + 4);
 
-		/* Store the metadata to memory (use the double-precision
-		 * _mm_storeh_pd because there is no integer function for
-		 * storing the upper 64b):
-		 * qe[0] metadata = sse_qe[0][63:0]
-		 * qe[1] metadata = sse_qe[0][127:64]
-		 * qe[2] metadata = sse_qe[1][63:0]
-		 * qe[3] metadata = sse_qe[1][127:64]
-		 */
-		_mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, sse_qe[0]);
-		_mm_storeh_pd((double *)&qe[1].u.opaque_data,
-			      (__m128d)sse_qe[0]);
-		_mm_storel_epi64((__m128i *)&qe[2].u.opaque_data, sse_qe[1]);
-		_mm_storeh_pd((double *)&qe[3].u.opaque_data,
-			      (__m128d)sse_qe[1]);
-
-		qe[0].data = ev[0].u64;
-		qe[1].data = ev[1].u64;
-		qe[2].data = ev[2].u64;
-		qe[3].data = ev[3].u64;
+ #ifdef __AVX512VL__
+
+			/*
+			 * 1) Build avx512 QE store and build each
+			 *    QE individually as XMM register
+			 * 2) Merge the 4 XMM registers/QEs into single AVX512
+			 *    register
+			 * 3) Store single avx512 register to &qe[0] (4x QEs
+			 *    stored in 1x store)
+			 */
+
+			__m128i v_qe0 = _mm_setzero_si128();
+			uint64_t meta = _mm_extract_epi64(sse_qe[0], 0);
+			v_qe0 = _mm_insert_epi64(v_qe0, ev[0].u64, 0);
+			v_qe0 = _mm_insert_epi64(v_qe0, meta, 1);
+
+			__m128i v_qe1 = _mm_setzero_si128();
+			meta = _mm_extract_epi64(sse_qe[0], 1);
+			v_qe1 = _mm_insert_epi64(v_qe1, ev[1].u64, 0);
+			v_qe1 = _mm_insert_epi64(v_qe1, meta, 1);
+
+			__m128i v_qe2 = _mm_setzero_si128();
+			meta = _mm_extract_epi64(sse_qe[1], 0);
+			v_qe2 = _mm_insert_epi64(v_qe2, ev[2].u64, 0);
+			v_qe2 = _mm_insert_epi64(v_qe2, meta, 1);
+
+			__m128i v_qe3 = _mm_setzero_si128();
+			meta = _mm_extract_epi64(sse_qe[1], 1);
+			v_qe3 = _mm_insert_epi64(v_qe3, ev[3].u64, 0);
+			v_qe3 = _mm_insert_epi64(v_qe3, meta, 1);
+
+			/* we have 4x XMM registers, one per QE. */
+			__m512i v_all_qes = _mm512_setzero_si512();
+			v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe0, 0);
+			v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe1, 1);
+			v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe2, 2);
+			v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe3, 3);
+
+			/*
+			 * store the 4x QEs in a single register to the scratch
+			 * space of the PMD
+			 */
+			_mm512_store_si512(&qe[0], v_all_qes);
+#else
+			/*
+			 * Store the metadata to memory (use the double-precision
+			 * _mm_storeh_pd because there is no integer function for
+			 * storing the upper 64b):
+			 * qe[0] metadata = sse_qe[0][63:0]
+			 * qe[1] metadata = sse_qe[0][127:64]
+			 * qe[2] metadata = sse_qe[1][63:0]
+			 * qe[3] metadata = sse_qe[1][127:64]
+			 */
+			_mm_storel_epi64((__m128i *)&qe[0].u.opaque_data,
+					 sse_qe[0]);
+			_mm_storeh_pd((double *)&qe[1].u.opaque_data,
+				      (__m128d)sse_qe[0]);
+			_mm_storel_epi64((__m128i *)&qe[2].u.opaque_data,
+					 sse_qe[1]);
+			_mm_storeh_pd((double *)&qe[3].u.opaque_data,
+				      (__m128d)sse_qe[1]);
+
+			qe[0].data = ev[0].u64;
+			qe[1].data = ev[1].u64;
+			qe[2].data = ev[2].u64;
+			qe[3].data = ev[3].u64;
+#endif
 
 		break;
 	case 3:
-- 
2.25.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-06-14 10:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-09 15:18 [PATCH] event/dlb2: add support for single 512B write of 4 QEs Timothy McDaniel
2022-05-14 12:07 ` Jerin Jacob
2022-05-16  8:42   ` Bruce Richardson
2022-05-16 17:00   ` McDaniel, Timothy
2022-05-19 20:24 ` [PATCH v3] " Timothy McDaniel
2022-05-23 16:09 ` [PATCH v4] " Timothy McDaniel
2022-05-23 16:34   ` Bruce Richardson
2022-05-23 16:52     ` McDaniel, Timothy
2022-05-23 16:55       ` Bruce Richardson
2022-06-09 17:40         ` Jerin Jacob
2022-06-09 18:02           ` McDaniel, Timothy
2022-05-23 16:37   ` Bruce Richardson
2022-05-23 16:45     ` McDaniel, Timothy
2022-06-10 12:43 ` [PATCH v6] " Timothy McDaniel
2022-06-10 15:41 ` [PATCH v7] " Timothy McDaniel
2022-06-10 16:15   ` Bruce Richardson
2022-06-10 16:27 ` [PATCH v8] " Timothy McDaniel
2022-06-13  6:30   ` Jerin Jacob
2022-06-13 20:39 ` [PATCH v9] " Timothy McDaniel
2022-06-14 10:40   ` Jerin Jacob

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).