From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 27861A057C; Tue, 24 Mar 2020 15:45:39 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 89FA62C6D; Tue, 24 Mar 2020 15:45:38 +0100 (CET) Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by dpdk.org (Postfix) with ESMTP id 877102AB for ; Tue, 24 Mar 2020 15:45:37 +0100 (CET) Received: from Internal Mail-Server by MTLPINE1 (envelope-from akozyrev@mellanox.com) with ESMTPS (AES256-SHA encrypted); 24 Mar 2020 16:45:36 +0200 Received: from pegasus01.mtr.labs.mlnx (pegasus01.mtr.labs.mlnx [10.210.16.120]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 02OEjahT029291; Tue, 24 Mar 2020 16:45:36 +0200 From: Alexander Kozyrev To: dev@dpdk.org Cc: rasland@mellanox.com, matan@mellanox.com, viacheslavo@mellanox.com Date: Tue, 24 Mar 2020 16:45:30 +0200 Message-Id: <20200324144530.25426-1-akozyrev@mellanox.com> X-Mailer: git-send-email 2.18.2 Subject: [dpdk-dev] [PATCH] net/mlx5: prefetch CQEs for a faster decompression X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Invalidation of consumed CQEs incurs a performance penalty due to many cache misses caused by a non-sequential CQEs access. Prefetch CQEs to get a better data locality and speed up the decompression of CQEs. Prefetching reduces CPI rate of the rxq_cq_decompress_v() function from 1 to 0.85 in my environment, resulting in 2% boost in mpps for 64B frames single core test. Signed-off-by: Alexander Kozyrev Acked-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 +++-- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 6 +++--- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 6 ++++-- 3 files changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index aa43cab084..90548ea22d 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -155,8 +155,9 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, const vector unsigned long shmax = {64, 64}; #endif - if (!(pos & 0x7) && pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); + for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) + if (likely(pos + i < mcqe_n)) + rte_prefetch0((void *)(cq + pos + i)); /* A.1 load mCQEs into a 128bit register. */ mcqe1 = (vector unsigned char)vec_vsx_ld(0, diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index 6d952df787..44f662e1c1 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -145,9 +145,9 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, -1UL << ((mcqe_n - pos) * sizeof(uint16_t) * 8) : 0); #endif - - if (!(pos & 0x7) && pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); + for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) + if (likely(pos + i < mcqe_n)) + rte_prefetch0((void *)(cq + pos + i)); __asm__ volatile ( /* A.1 load mCQEs into a 128bit register. */ "ld1 {v16.16b - v17.16b}, [%[mcq]] \n\t" diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 406f23f595..9db9003acd 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -133,8 +133,10 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif - if (!(pos & 0x7) && pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); + for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) + if (likely(pos + i < mcqe_n)) + rte_prefetch0((void *)(cq + pos + i)); + /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); -- 2.18.2