From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 47257A00C2; Sat, 9 Apr 2022 17:18:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D91224068A; Sat, 9 Apr 2022 17:18:55 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mails.dpdk.org (Postfix) with ESMTP id E0C374067E for ; Sat, 9 Apr 2022 17:18:53 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1649517534; x=1681053534; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=6sIp0PWwtaqecBUeH5oLFuZuK1c706r08NoNeSmmOcc=; b=Rsrfaqswqc6i8ZHfSj0B3UAG+k8a6brJRPzckXY7IlZo5n5sXZraKLds QuGGeeljxD7Na/WoY32PsMYx70zt8tjDO+NZ7SM48mUiyY9b+qEKZgl+r JWqJ7/CNG6QGVpa1Wp77k9JXaKx5PGnHQDLb32nipv0kaofXCbkveC0Cx 0MDjg5THxrT7q8ClhsBUKM+uY4dOh14R1cckz0ko8KrFKTG5e4MEKnHw7 uCc6kiN2gHstUomAKlvB/rA8R/bHkAzbK7LcEy9raFp/FJr+1tyj38B6E b/yJIBlgmAgT4zSnKKylAfk1hZFQ17BHn3hbJx2uTT4q8JisvlL+2swOt w==; X-IronPort-AV: E=McAfee;i="6400,9594,10312"; a="260661472" X-IronPort-AV: E=Sophos;i="5.90,247,1643702400"; d="scan'208";a="260661472" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Apr 2022 08:18:53 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,247,1643702400"; d="scan'208";a="698685577" Received: from txanpdk03.an.intel.com ([10.123.117.78]) by fmsmga001.fm.intel.com with ESMTP; 09 Apr 2022 08:18:52 -0700 From: Timothy McDaniel To: jerinj@marvell.com Cc: dev@dpdk.org Subject: [PATCH] event/dlb2: add support for single 512B write of 4 QEs Date: Sat, 9 Apr 2022 10:18:49 -0500 Message-Id: <20220409151849.1007602-1-timothy.mcdaniel@intel.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Xeon, as 512b accesses are available, movdir64 instruction is able to perform 512b read and write to DLB producer port. In order for movdir64 to be able to pull its data from store buffers (store-buffer-forwarding) (before actual write), data should be in single 512b write format. This commit add change when code is built for Xeon with 512b AVX support to make single 512b write of all 4 QEs instead of 4x64b writes. Signed-off-by: Timothy McDaniel --- drivers/event/dlb2/dlb2.c | 86 ++++++++++++++++++++++++++++++--------- 1 file changed, 67 insertions(+), 19 deletions(-) diff --git a/drivers/event/dlb2/dlb2.c b/drivers/event/dlb2/dlb2.c index 36f07d0061..e2a5303310 100644 --- a/drivers/event/dlb2/dlb2.c +++ b/drivers/event/dlb2/dlb2.c @@ -2776,25 +2776,73 @@ dlb2_event_build_hcws(struct dlb2_port *qm_port, ev[3].event_type, DLB2_QE_EV_TYPE_WORD + 4); - /* Store the metadata to memory (use the double-precision - * _mm_storeh_pd because there is no integer function for - * storing the upper 64b): - * qe[0] metadata = sse_qe[0][63:0] - * qe[1] metadata = sse_qe[0][127:64] - * qe[2] metadata = sse_qe[1][63:0] - * qe[3] metadata = sse_qe[1][127:64] - */ - _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, sse_qe[0]); - _mm_storeh_pd((double *)&qe[1].u.opaque_data, - (__m128d)sse_qe[0]); - _mm_storel_epi64((__m128i *)&qe[2].u.opaque_data, sse_qe[1]); - _mm_storeh_pd((double *)&qe[3].u.opaque_data, - (__m128d)sse_qe[1]); - - qe[0].data = ev[0].u64; - qe[1].data = ev[1].u64; - qe[2].data = ev[2].u64; - qe[3].data = ev[3].u64; + #ifdef __AVX512VL__ + + /* + * 1) Build avx512 QE store and build each + * QE individually as XMM register + * 2) Merge the 4 XMM registers/QEs into single AVX512 + * register + * 3) Store single avx512 register to &qe[0] (4x QEs + * stored in 1x store) + */ + + __m128i v_qe0 = _mm_setzero_si128(); + uint64_t meta = _mm_extract_epi64(sse_qe[0], 0); + v_qe0 = _mm_insert_epi64(v_qe0, ev[0].u64, 0); + v_qe0 = _mm_insert_epi64(v_qe0, meta, 1); + + __m128i v_qe1 = _mm_setzero_si128(); + meta = _mm_extract_epi64(sse_qe[0], 1); + v_qe1 = _mm_insert_epi64(v_qe1, ev[1].u64, 0); + v_qe1 = _mm_insert_epi64(v_qe1, meta, 1); + + __m128i v_qe2 = _mm_setzero_si128(); + meta = _mm_extract_epi64(sse_qe[1], 0); + v_qe2 = _mm_insert_epi64(v_qe2, ev[2].u64, 0); + v_qe2 = _mm_insert_epi64(v_qe2, meta, 1); + + __m128i v_qe3 = _mm_setzero_si128(); + meta = _mm_extract_epi64(sse_qe[1], 1); + v_qe3 = _mm_insert_epi64(v_qe3, ev[3].u64, 0); + v_qe3 = _mm_insert_epi64(v_qe3, meta, 1); + + /* we have 4x XMM registers, one per QE. */ + __m512i v_all_qes = _mm512_setzero_si512(); + v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe0, 0); + v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe1, 1); + v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe2, 2); + v_all_qes = _mm512_inserti32x4(v_all_qes, v_qe3, 3); + + /* + * store the 4x QEs in a single register to the scratch + * space of the PMD + */ + _mm512_store_si512(&qe[0], v_all_qes); +#else + /* + * Store the metadata to memory (use the double-precision + * _mm_storeh_pd because there is no integer function for + * storing the upper 64b): + * qe[0] metadata = sse_qe[0][63:0] + * qe[1] metadata = sse_qe[0][127:64] + * qe[2] metadata = sse_qe[1][63:0] + * qe[3] metadata = sse_qe[1][127:64] + */ + _mm_storel_epi64((__m128i *)&qe[0].u.opaque_data, + sse_qe[0]); + _mm_storeh_pd((double *)&qe[1].u.opaque_data, + (__m128d)sse_qe[0]); + _mm_storel_epi64((__m128i *)&qe[2].u.opaque_data, + sse_qe[1]); + _mm_storeh_pd((double *)&qe[3].u.opaque_data, + (__m128d)sse_qe[1]); + + qe[0].data = ev[0].u64; + qe[1].data = ev[1].u64; + qe[2].data = ev[2].u64; + qe[3].data = ev[3].u64; +#endif break; case 3: -- 2.25.1