From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 24E57A0A0E for ; Mon, 10 May 2021 18:22:24 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1C55F4003E; Mon, 10 May 2021 18:22:24 +0200 (CEST) Received: from NAM11-BN8-obe.outbound.protection.outlook.com (mail-bn8nam11on2045.outbound.protection.outlook.com [40.107.236.45]) by mails.dpdk.org (Postfix) with ESMTP id 9620A4003E for ; Mon, 10 May 2021 18:22:22 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FrNT0Ff/PQzLeLKVPIDvLBEhUxixoYUIf1dgatlprmmjdbNv7lcC2wu2Vsr7DDQ94TnGgiY+imKfQTZQbhTfG6fbLJnCsXZH0ODb2KMpxJobUOuSK5h/+BoyDWxKNXrhM6f/Q+JYpg/I7hVTSg+h8ErV3JX+2nbmRZfGxgvlOZEfd4Q8VLSUtvnV2HjOP/BKFW1UeyDvX1ajzCkGJ5h/6dQSnUmd3J1BV73FOdTFjADK2qsW45aoCfU9/xT4Qegofqbag4u1Pr0c5JZmV0vzp+9lF8wS+pFCSjXPu5Zi3Uu8Mshw8JkAW4SR3z6f0WuuBRIC/T2fgOVYKVvUxOmbCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XUBGvGcTyAtWDCtesBNXtG0lRuA4LgK/x84qObn0bKM=; b=UBMigFnwIqHf5fzHbs5vkXw4zF5onFP1yRKkHRsArnJduULRJiLovgTGgnVXlimutXbVUHZjv9aIcZoInFy2DfDbDesV/z7RbfU3ZUe9+UQfzeljk28Ob0Rw68g6YenoPaJf0eE9rqdQs+lyXixxoVP3EpYzWKcCUcCAEDldOrwCeLQK1zzzT+8zMmdPMeCoatubByCllWBRGc+FueBTRP1FW6wNlKttFnLM7riGVWrligfFUa4sxKwm8v11H+1yG9xiff28ycUCNUzYzscSMwRzMbhS0xXf0L90A5QfRCylusGOZL50qENgYksEbEDqgKY+uNZUnzn7uCTeFZ7fOw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=XUBGvGcTyAtWDCtesBNXtG0lRuA4LgK/x84qObn0bKM=; b=J6ValBexTHPHO5W+Nif4xOBEt10c7Bj4Za4kmfKVAYlh1aXXsVJ44Z1FE0t22XFJbdy612jiexQAopYphFB/MvchU8mbo2rCLxLIXuJi1yKaXw6o9Em9K0vrARSbdkE5KRZYAW0HyLPst0wZq7C/rKvaIa6f5WWhQ6wToal/e4T9beNHMhc7Cmh7sEkX1+6eYsnZgZ/tDWdNUrg/OdF4G03X9FCvqJKHZD2JQPRK6P60YMAQrZaI+0wKZiGV1jQhJnWKNXgvZuZb+XOuHYLe4jPRxjx15B9tfeKNBazVXBwWuUIJmKU2bFaoPW2T1iBAhwRPBcgkaEA9OODd7J1iIg== Received: from BN6PR1201CA0019.namprd12.prod.outlook.com (2603:10b6:405:4c::29) by BL1PR12MB5269.namprd12.prod.outlook.com (2603:10b6:208:30b::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4108.26; Mon, 10 May 2021 16:22:20 +0000 Received: from BN8NAM11FT040.eop-nam11.prod.protection.outlook.com (2603:10b6:405:4c:cafe::89) by BN6PR1201CA0019.outlook.office365.com (2603:10b6:405:4c::29) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4108.24 via Frontend Transport; Mon, 10 May 2021 16:22:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT040.mail.protection.outlook.com (10.13.177.166) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4108.25 via Frontend Transport; Mon, 10 May 2021 16:22:20 +0000 Received: from nvidia.com (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 10 May 2021 16:22:17 +0000 From: Xueming Li To: Wenzhuo Lu CC: Luca Boccassi , David Coyle , dpdk stable Date: Tue, 11 May 2021 00:02:20 +0800 Message-ID: <20210510160258.30982-191-xuemingl@nvidia.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20210510160258.30982-1-xuemingl@nvidia.com> References: <20210510160258.30982-1-xuemingl@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: f39459ec-02cf-478d-e516-08d913cfc951 X-MS-TrafficTypeDiagnostic: BL1PR12MB5269: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Tmlume8FG6HspPvq0m5Ukc+wFz4p/qwiJe9xtPCgcKefzhxGhzOxeEGmqYNf+1LssiTI69M3bsmyf8+DdeRFWFzyKK3InQSTeqSmfsJINaPUbmNGpohZdarFCNtaJRfwOalnYlQ6XQve2SwQRkg+dabqR+4JZM8/PFWamC1cSpXT1ipWimRpQHBudnquLeBQX3QR9Xzd/DcujK3lCBAAbpTU68/f+qJwhimcNKmiGEFEWYdr1uf4K1LsOFgXTkWXakHY2jMmMiVME9h00WoNPlmuTgpBmccgr0SrC2FS9sox4wDYSibtf04rjwczkztgaxse7rmrVT6II7V9B/FVbjXROcXS7n2NxWgF9qYjg+6LXsZKlT7Z4ZtxKGB3KCfXULfrXqRkFp8xkp0yJ0aqyVRCkRwvHXMKkbV40BLpVjJBRzD4usYLEJnvXASByFundL7OUErnj9cLpZ9pkZwDbmlVYphr0OXYoiW5H9rjEohSNHIbdJdO0/Vhq/0RDqyHW+HgS6mWUCXZtxr3RxHKsqgAid7lwltEJG/GPdS5V/O4f02yGN2lbYRtENoIZD6xqokInLo2/ZEecVzjqretBYm87M8WTRRHdcZd2T5IEJ4SIRRAl+vH0cvSPVh9bkVDFFjGcdVoguc5sGqp74BMKXchELdGRQoGiVTIDj7IaHdL9Umy7JnakHnhaA7BBCmBu4ygcR0s3ihAbSBUdswKnTFSfXtv4i//OS94ulbNl6mj+FRnnk9bwcdHPGyaB/S5xG5c2Uqfcbj/9O6dfWhNug== X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(46966006)(36840700001)(36756003)(70206006)(336012)(83380400001)(1076003)(26005)(47076005)(54906003)(55016002)(70586007)(6916009)(86362001)(2616005)(82310400003)(426003)(2906002)(8676002)(30864003)(186003)(966005)(4326008)(16526019)(7696005)(498600001)(8936002)(36860700001)(7636003)(36906005)(53546011)(6286002)(6666004)(5660300002)(356005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 10 May 2021 16:22:20.3792 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f39459ec-02cf-478d-e516-08d913cfc951 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT040.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5269 Subject: [dpdk-stable] patch 'net/iavf: fix crash in AVX512' has been queued to stable release 20.11.2 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" Hi, FYI, your patch has been queued to stable release 20.11.2 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 05/12/21. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Queued patches are on a temporary branch at: https://github.com/steevenlee/dpdk This queued commit can be viewed at: https://github.com/steevenlee/dpdk/commit/b59be07a7746d211f26e743df3e745d9882a04a7 Thanks. Xueming Li --- >From b59be07a7746d211f26e743df3e745d9882a04a7 Mon Sep 17 00:00:00 2001 From: Wenzhuo Lu Date: Wed, 14 Apr 2021 15:25:24 +0800 Subject: [PATCH] net/iavf: fix crash in AVX512 Cc: Luca Boccassi [ upstream commit 4eb3dcce7c5dacd57cfb9b6cfb3c1b52846ee9de ] Fix segment fault when failing to get the memory from the pool. If there's no memory in the default cache, fall back to the previous process. The previous AVX2 rearm function is changed to add some AVX512 instructions and changed to a callee of the AVX2 and AVX512 rearm functions. Fixes: 31737f2b66fb ("net/iavf: enable AVX512 for legacy Rx") Reported-by: David Coyle Signed-off-by: Wenzhuo Lu Tested-by: David Coyle --- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 120 +------------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 5 +- drivers/net/iavf/iavf_rxtx_vec_common.h | 203 ++++++++++++++++++++++++ 3 files changed, 209 insertions(+), 119 deletions(-) diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 8f28afc8c5..4c8ed694df 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -10,126 +10,10 @@ #pragma GCC diagnostic ignored "-Wcast-qual" #endif -static inline void +static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { - int i; - uint16_t rx_id; - volatile union iavf_rx_desc *rxdp; - struct rte_mbuf **rxp = &rxq->sw_ring[rxq->rxrearm_start]; - - rxdp = rxq->rx_ring + rxq->rxrearm_start; - - /* Pull 'n' more MBUFs into the software ring */ - if (rte_mempool_get_bulk(rxq->mp, - (void *)rxp, - IAVF_RXQ_REARM_THRESH) < 0) { - if (rxq->rxrearm_nb + IAVF_RXQ_REARM_THRESH >= - rxq->nb_rx_desc) { - __m128i dma_addr0; - - dma_addr0 = _mm_setzero_si128(); - for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { - rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, - dma_addr0); - } - } - rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += - IAVF_RXQ_REARM_THRESH; - return; - } - -#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC - struct rte_mbuf *mb0, *mb1; - __m128i dma_addr0, dma_addr1; - __m128i hdr_room = _mm_set_epi64x(RTE_PKTMBUF_HEADROOM, - RTE_PKTMBUF_HEADROOM); - /* Initialize the mbufs in vector, process 2 mbufs in one loop */ - for (i = 0; i < IAVF_RXQ_REARM_THRESH; i += 2, rxp += 2) { - __m128i vaddr0, vaddr1; - - mb0 = rxp[0]; - mb1 = rxp[1]; - - /* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */ - RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) != - offsetof(struct rte_mbuf, buf_addr) + 8); - vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr); - vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr); - - /* convert pa to dma_addr hdr/data */ - dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0); - dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1); - - /* add headroom to pa values */ - dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room); - dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); - - /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); - } -#else - struct rte_mbuf *mb0, *mb1, *mb2, *mb3; - __m256i dma_addr0_1, dma_addr2_3; - __m256i hdr_room = _mm256_set1_epi64x(RTE_PKTMBUF_HEADROOM); - /* Initialize the mbufs in vector, process 4 mbufs in one loop */ - for (i = 0; i < IAVF_RXQ_REARM_THRESH; - i += 4, rxp += 4, rxdp += 4) { - __m128i vaddr0, vaddr1, vaddr2, vaddr3; - __m256i vaddr0_1, vaddr2_3; - - mb0 = rxp[0]; - mb1 = rxp[1]; - mb2 = rxp[2]; - mb3 = rxp[3]; - - /* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */ - RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) != - offsetof(struct rte_mbuf, buf_addr) + 8); - vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr); - vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr); - vaddr2 = _mm_loadu_si128((__m128i *)&mb2->buf_addr); - vaddr3 = _mm_loadu_si128((__m128i *)&mb3->buf_addr); - - /** - * merge 0 & 1, by casting 0 to 256-bit and inserting 1 - * into the high lanes. Similarly for 2 & 3 - */ - vaddr0_1 = - _mm256_inserti128_si256(_mm256_castsi128_si256(vaddr0), - vaddr1, 1); - vaddr2_3 = - _mm256_inserti128_si256(_mm256_castsi128_si256(vaddr2), - vaddr3, 1); - - /* convert pa to dma_addr hdr/data */ - dma_addr0_1 = _mm256_unpackhi_epi64(vaddr0_1, vaddr0_1); - dma_addr2_3 = _mm256_unpackhi_epi64(vaddr2_3, vaddr2_3); - - /* add headroom to pa values */ - dma_addr0_1 = _mm256_add_epi64(dma_addr0_1, hdr_room); - dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); - - /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); - } - -#endif - - rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; - if (rxq->rxrearm_start >= rxq->nb_rx_desc) - rxq->rxrearm_start = 0; - - rxq->rxrearm_nb -= IAVF_RXQ_REARM_THRESH; - - rx_id = (uint16_t)((rxq->rxrearm_start == 0) ? - (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1)); - - /* Update the tail pointer on the NIC */ - IAVF_PCI_REG_WRITE(rxq->qrx_tail, rx_id); + return iavf_rxq_rearm_common(rxq, false); } #define PKTLEN_SHIFT 10 diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index 4aae99031e..83072e3d50 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -13,7 +13,7 @@ #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 -static inline void +static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { int i; @@ -25,6 +25,9 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxdp = rxq->rx_ring + rxq->rxrearm_start; + if (unlikely(!cache)) + return iavf_rxq_rearm_common(rxq, true); + /* We need to pull 'n' more MBUFs into the software ring from mempool * We inline the mempool function here, so we can vectorize the copy * from the cache into the shadow ring. diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 7ad1e0f68a..7629474508 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,6 +11,10 @@ #include "iavf.h" #include "iavf_rxtx.h" +#ifndef __INTEL_COMPILER +#pragma GCC diagnostic ignored "-Wcast-qual" +#endif + static inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -276,4 +280,203 @@ iavf_tx_vec_dev_check_default(struct rte_eth_dev *dev) return 0; } +#ifdef CC_AVX2_SUPPORT +static __rte_always_inline void +iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) +{ + int i; + uint16_t rx_id; + volatile union iavf_rx_desc *rxdp; + struct rte_mbuf **rxp = &rxq->sw_ring[rxq->rxrearm_start]; + + rxdp = rxq->rx_ring + rxq->rxrearm_start; + + /* Pull 'n' more MBUFs into the software ring */ + if (rte_mempool_get_bulk(rxq->mp, + (void *)rxp, + IAVF_RXQ_REARM_THRESH) < 0) { + if (rxq->rxrearm_nb + IAVF_RXQ_REARM_THRESH >= + rxq->nb_rx_desc) { + __m128i dma_addr0; + + dma_addr0 = _mm_setzero_si128(); + for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { + rxp[i] = &rxq->fake_mbuf; + _mm_store_si128((__m128i *)&rxdp[i].read, + dma_addr0); + } + } + rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += + IAVF_RXQ_REARM_THRESH; + return; + } + +#ifndef RTE_LIBRTE_IAVF_16BYTE_RX_DESC + struct rte_mbuf *mb0, *mb1; + __m128i dma_addr0, dma_addr1; + __m128i hdr_room = _mm_set_epi64x(RTE_PKTMBUF_HEADROOM, + RTE_PKTMBUF_HEADROOM); + /* Initialize the mbufs in vector, process 2 mbufs in one loop */ + for (i = 0; i < IAVF_RXQ_REARM_THRESH; i += 2, rxp += 2) { + __m128i vaddr0, vaddr1; + + mb0 = rxp[0]; + mb1 = rxp[1]; + + /* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */ + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) != + offsetof(struct rte_mbuf, buf_addr) + 8); + vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr); + vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr); + + /* convert pa to dma_addr hdr/data */ + dma_addr0 = _mm_unpackhi_epi64(vaddr0, vaddr0); + dma_addr1 = _mm_unpackhi_epi64(vaddr1, vaddr1); + + /* add headroom to pa values */ + dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room); + dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); + + /* flush desc with pa dma_addr */ + _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); + _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + } +#else +#ifdef CC_AVX512_SUPPORT + if (avx512) { + struct rte_mbuf *mb0, *mb1, *mb2, *mb3; + struct rte_mbuf *mb4, *mb5, *mb6, *mb7; + __m512i dma_addr0_3, dma_addr4_7; + __m512i hdr_room = _mm512_set1_epi64(RTE_PKTMBUF_HEADROOM); + /* Initialize the mbufs in vector, process 8 mbufs in one loop */ + for (i = 0; i < IAVF_RXQ_REARM_THRESH; + i += 8, rxp += 8, rxdp += 8) { + __m128i vaddr0, vaddr1, vaddr2, vaddr3; + __m128i vaddr4, vaddr5, vaddr6, vaddr7; + __m256i vaddr0_1, vaddr2_3; + __m256i vaddr4_5, vaddr6_7; + __m512i vaddr0_3, vaddr4_7; + + mb0 = rxp[0]; + mb1 = rxp[1]; + mb2 = rxp[2]; + mb3 = rxp[3]; + mb4 = rxp[4]; + mb5 = rxp[5]; + mb6 = rxp[6]; + mb7 = rxp[7]; + + /* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */ + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) != + offsetof(struct rte_mbuf, buf_addr) + 8); + vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr); + vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr); + vaddr2 = _mm_loadu_si128((__m128i *)&mb2->buf_addr); + vaddr3 = _mm_loadu_si128((__m128i *)&mb3->buf_addr); + vaddr4 = _mm_loadu_si128((__m128i *)&mb4->buf_addr); + vaddr5 = _mm_loadu_si128((__m128i *)&mb5->buf_addr); + vaddr6 = _mm_loadu_si128((__m128i *)&mb6->buf_addr); + vaddr7 = _mm_loadu_si128((__m128i *)&mb7->buf_addr); + + /** + * merge 0 & 1, by casting 0 to 256-bit and inserting 1 + * into the high lanes. Similarly for 2 & 3, and so on. + */ + vaddr0_1 = + _mm256_inserti128_si256(_mm256_castsi128_si256(vaddr0), + vaddr1, 1); + vaddr2_3 = + _mm256_inserti128_si256(_mm256_castsi128_si256(vaddr2), + vaddr3, 1); + vaddr4_5 = + _mm256_inserti128_si256(_mm256_castsi128_si256(vaddr4), + vaddr5, 1); + vaddr6_7 = + _mm256_inserti128_si256(_mm256_castsi128_si256(vaddr6), + vaddr7, 1); + vaddr0_3 = + _mm512_inserti64x4(_mm512_castsi256_si512(vaddr0_1), + vaddr2_3, 1); + vaddr4_7 = + _mm512_inserti64x4(_mm512_castsi256_si512(vaddr4_5), + vaddr6_7, 1); + + /* convert pa to dma_addr hdr/data */ + dma_addr0_3 = _mm512_unpackhi_epi64(vaddr0_3, vaddr0_3); + dma_addr4_7 = _mm512_unpackhi_epi64(vaddr4_7, vaddr4_7); + + /* add headroom to pa values */ + dma_addr0_3 = _mm512_add_epi64(dma_addr0_3, hdr_room); + dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); + + /* flush desc with pa dma_addr */ + _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); + _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + } + } else +#endif + { + struct rte_mbuf *mb0, *mb1, *mb2, *mb3; + __m256i dma_addr0_1, dma_addr2_3; + __m256i hdr_room = _mm256_set1_epi64x(RTE_PKTMBUF_HEADROOM); + /* Initialize the mbufs in vector, process 4 mbufs in one loop */ + for (i = 0; i < IAVF_RXQ_REARM_THRESH; + i += 4, rxp += 4, rxdp += 4) { + __m128i vaddr0, vaddr1, vaddr2, vaddr3; + __m256i vaddr0_1, vaddr2_3; + + mb0 = rxp[0]; + mb1 = rxp[1]; + mb2 = rxp[2]; + mb3 = rxp[3]; + + /* load buf_addr(lo 64bit) and buf_iova(hi 64bit) */ + RTE_BUILD_BUG_ON(offsetof(struct rte_mbuf, buf_iova) != + offsetof(struct rte_mbuf, buf_addr) + 8); + vaddr0 = _mm_loadu_si128((__m128i *)&mb0->buf_addr); + vaddr1 = _mm_loadu_si128((__m128i *)&mb1->buf_addr); + vaddr2 = _mm_loadu_si128((__m128i *)&mb2->buf_addr); + vaddr3 = _mm_loadu_si128((__m128i *)&mb3->buf_addr); + + /** + * merge 0 & 1, by casting 0 to 256-bit and inserting 1 + * into the high lanes. Similarly for 2 & 3 + */ + vaddr0_1 = + _mm256_inserti128_si256(_mm256_castsi128_si256(vaddr0), + vaddr1, 1); + vaddr2_3 = + _mm256_inserti128_si256(_mm256_castsi128_si256(vaddr2), + vaddr3, 1); + + /* convert pa to dma_addr hdr/data */ + dma_addr0_1 = _mm256_unpackhi_epi64(vaddr0_1, vaddr0_1); + dma_addr2_3 = _mm256_unpackhi_epi64(vaddr2_3, vaddr2_3); + + /* add headroom to pa values */ + dma_addr0_1 = _mm256_add_epi64(dma_addr0_1, hdr_room); + dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); + + /* flush desc with pa dma_addr */ + _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); + _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + } + } + +#endif + + rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; + if (rxq->rxrearm_start >= rxq->nb_rx_desc) + rxq->rxrearm_start = 0; + + rxq->rxrearm_nb -= IAVF_RXQ_REARM_THRESH; + + rx_id = (uint16_t)((rxq->rxrearm_start == 0) ? + (rxq->nb_rx_desc - 1) : (rxq->rxrearm_start - 1)); + + /* Update the tail pointer on the NIC */ + IAVF_PCI_REG_WRITE(rxq->qrx_tail, rx_id); +} +#endif + #endif -- 2.25.1 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2021-05-10 23:59:31.530655000 +0800 +++ 0192-net-iavf-fix-crash-in-AVX512.patch 2021-05-10 23:59:26.650000000 +0800 @@ -1 +1 @@ -From 4eb3dcce7c5dacd57cfb9b6cfb3c1b52846ee9de Mon Sep 17 00:00:00 2001 +From b59be07a7746d211f26e743df3e745d9882a04a7 Mon Sep 17 00:00:00 2001 @@ -4,0 +5,3 @@ +Cc: Luca Boccassi + +[ upstream commit 4eb3dcce7c5dacd57cfb9b6cfb3c1b52846ee9de ] @@ -15 +17,0 @@ -Cc: stable@dpdk.org @@ -27 +29 @@ -index cdb51397ff..f5646d6453 100644 +index 8f28afc8c5..4c8ed694df 100644 @@ -160 +162 @@ -index 67184ae3f4..385f44ec47 100644 +index 4aae99031e..83072e3d50 100644 @@ -183 +185 @@ -index 46a18732d3..816e16a937 100644 +index 7ad1e0f68a..7629474508 100644