From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5591EA050A for ; Wed, 13 Apr 2022 12:32:39 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 47CE740E09; Wed, 13 Apr 2022 12:32:39 +0200 (CEST) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by mails.dpdk.org (Postfix) with ESMTP id 9F16A40E09; Wed, 13 Apr 2022 12:32:37 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 223CE1570; Wed, 13 Apr 2022 03:32:37 -0700 (PDT) Received: from net-arm-n1amp-02.shanghai.arm.com (net-arm-n1amp-02.shanghai.arm.com [10.169.210.142]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5DA843F73B; Wed, 13 Apr 2022 03:32:34 -0700 (PDT) From: Ruifeng Wang To: ajit.khaparde@broadcom.com, somnath.kotur@broadcom.com Cc: dev@dpdk.org, honnappa.nagarahalli@arm.com, nd@arm.com, Ruifeng Wang , lance.richardson@broadcom.com, stable@dpdk.org Subject: [PATCH 3/3] net/bnxt: fix risk in Rx descriptor read in NEON path Date: Wed, 13 Apr 2022 18:31:56 +0800 Message-Id: <20220413103156.3680600-4-ruifeng.wang@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220413103156.3680600-1-ruifeng.wang@arm.com> References: <20220413103156.3680600-1-ruifeng.wang@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Rx descriptor contains a valid bit which indicates readiness of the rest of descriptor words. Hence, the word contains valid bit must be read prior to other words. In NEON vector path, two contiguous 8B descriptor are loaded to a single NEON register. Given vector load ensures no 16B atomicity, read of the word that includes valid bit could be reordered after read of other words. In this case, data could be invalid. Reloaded lower 64b after read barrier. This ensures what fetched is correct. Also fixed comments that not pertains to Arm platform architecture. Fixes: deae85145c64 ("net/bnxt: handle multiple packets per loop in vector Rx") Cc: lance.richardson@broadcom.com Cc: stable@dpdk.org Signed-off-by: Ruifeng Wang --- drivers/net/bnxt/bnxt_rxtx_vec_neon.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c index 779e23ac4f..32f8e59b3a 100644 --- a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c +++ b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c @@ -231,25 +231,38 @@ recv_burst_vec_neon(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts) } /* - * Load the four current descriptors into SSE registers in - * reverse order to ensure consistent state. + * Load the four current descriptors into NEON registers. + * IO barriers are used to ensure consistent state. */ rxcmp1[3] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 7]); rte_io_rmb(); + /* Reload lower 64b of descriptors to make it ordered after info3_v. */ + rxcmp1[3] = vreinterpretq_u32_u64(vld1q_lane_u64 + ((void *)&cpr->cp_desc_ring[cons + 7], + vreinterpretq_u64_u32(rxcmp1[3]), 0)); rxcmp[3] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 6]); rxcmp1[2] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 5]); rte_io_rmb(); + rxcmp1[2] = vreinterpretq_u32_u64(vld1q_lane_u64 + ((void *)&cpr->cp_desc_ring[cons + 5], + vreinterpretq_u64_u32(rxcmp1[2]), 0)); rxcmp[2] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 4]); t1 = vreinterpretq_u64_u32(vzip2q_u32(rxcmp1[2], rxcmp1[3])); rxcmp1[1] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 3]); rte_io_rmb(); + rxcmp1[1] = vreinterpretq_u32_u64(vld1q_lane_u64 + ((void *)&cpr->cp_desc_ring[cons + 3], + vreinterpretq_u64_u32(rxcmp1[1]), 0)); rxcmp[1] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 2]); rxcmp1[0] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 1]); rte_io_rmb(); + rxcmp1[0] = vreinterpretq_u32_u64(vld1q_lane_u64 + ((void *)&cpr->cp_desc_ring[cons + 1], + vreinterpretq_u64_u32(rxcmp1[0]), 0)); rxcmp[0] = vld1q_u32((void *)&cpr->cp_desc_ring[cons + 0]); t0 = vreinterpretq_u64_u32(vzip2q_u32(rxcmp1[0], rxcmp1[1])); -- 2.25.1