From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id B3AEE455D8; Tue, 9 Jul 2024 09:30:04 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3C0EB42EC3; Tue, 9 Jul 2024 09:29:54 +0200 (CEST) Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2138.outbound.protection.outlook.com [40.107.93.138]) by mails.dpdk.org (Postfix) with ESMTP id BF24242EB2 for ; Tue, 9 Jul 2024 09:29:49 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=y2ZU+meclF4raEOyMAX3f0F5SIh2dzjgXeXk0cnuZRYo4P32eAPzKfcvederju+YT6mn6iKNwNVK7wF5sXz0IZbRrqNO2fsM7i7Sn6vSY4Mst3b2DFDDJlKaImwM3KTWgGTOGTvM53wUnpX7IQnnuvOOE2qFg/vtQ4D2YZl8fiAbPBvRbkxeFAiTCi6wlDHYOithrVqWQ+dvU1mocunV3/L34JbtQLaMfQr2M14whRT8yTMgGLa4skEaJmNP0hZUpWKafM/VnTPoGmuW7caG7LhMRJWckynFVZDnqjD9XtwFPMv42Hxl7GR9pspr9mgOyNVVqssk4VSiE2Oii1Qxdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=N6JCFJ1UfYjxx3p4MczPGXGg1TietJWwueWtE9jfUuI=; b=Id1ystLUZziwe5TRMsOmt+s6m8XoBhT8r1g63gsaxsU3QzIRkdsRXjgve3hx6N1/CzCYVkq8O6p5zCmbu9cto5Gn/e04d+Nkouo0lQMJdKaT8nQ8VYVXx8yFHDRhfPB8qstLH4NsjMO3eCAUHWaHrAB1XRoAW/qlUbPj5LOVT602AlyKLQT4PdJjKMn4kGi5OXtWDpiwgyKM9Y1pLJ2d+nTSfoB6hx3rLx5jHG0OHT1vm2zofpGRfMaUI9f3xoYMQ5djWIqaeb08xGqp5DhQUaw/M6ezVL1nU9Spqan6E9JBzdAF8i09i0ALp2BUh8YGemlsIzb8p3dCyLSWcqIyXA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=corigine.com; dmarc=pass action=none header.from=corigine.com; dkim=pass header.d=corigine.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=corigine.onmicrosoft.com; s=selector2-corigine-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=N6JCFJ1UfYjxx3p4MczPGXGg1TietJWwueWtE9jfUuI=; b=IPaEV10RPIf+bmYo9zzSnXA+VRnAMPfoOZoDhiHWe8l9kU1Rgt/SIX5uk5LF9t1Ae0ETA6HMRlvZly6fmY/2/DwMh110HbHe2qDg/RgVDajCQpj2vHXq7JCNnUk1/WDV5SKW9EZTpoJ48PNsOMTbk8vz/N6x+3HKkf+T2hGZ/JI= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=corigine.com; Received: from SJ0PR13MB5545.namprd13.prod.outlook.com (2603:10b6:a03:424::5) by SA1PR13MB4958.namprd13.prod.outlook.com (2603:10b6:806:189::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.36; Tue, 9 Jul 2024 07:29:48 +0000 Received: from SJ0PR13MB5545.namprd13.prod.outlook.com ([fe80::b900:5f05:766f:833]) by SJ0PR13MB5545.namprd13.prod.outlook.com ([fe80::b900:5f05:766f:833%4]) with mapi id 15.20.7741.033; Tue, 9 Jul 2024 07:29:48 +0000 From: Chaoyong He To: dev@dpdk.org Cc: oss-drivers@corigine.com, Long Wu , Peng Zhang , Chaoyong He Subject: [PATCH v3 3/4] net/nfp: support AVX2 Rx function Date: Tue, 9 Jul 2024 15:29:20 +0800 Message-Id: <20240709072921.246520-4-chaoyong.he@corigine.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20240709072921.246520-1-chaoyong.he@corigine.com> References: <20240708055854.107739-1-chaoyong.he@corigine.com> <20240709072921.246520-1-chaoyong.he@corigine.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: SJ0PR13CA0232.namprd13.prod.outlook.com (2603:10b6:a03:2c1::27) To SJ0PR13MB5545.namprd13.prod.outlook.com (2603:10b6:a03:424::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ0PR13MB5545:EE_|SA1PR13MB4958:EE_ X-MS-Office365-Filtering-Correlation-Id: d4159e9d-c299-4f72-7cb0-08dc9fe8e999 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|1800799024|52116014|366016|376014|38350700014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?a+jy6sFvBEHgR1HZrUrNut2r+1g07gMMYGP4fARI794WJjKmCHqD9yGFwKMp?= =?us-ascii?Q?pc0Vqbl/uOuHbGTzu6RT7TyD0zZMMbfeJh0V2jutkzoiMqEH2C9Ty6wXtV5N?= =?us-ascii?Q?UcyVKHxGBAEPXsru8ep1UlQ73kBHDZbBdTnoc9SJEj+dth17hFRu+86d/DKE?= =?us-ascii?Q?K7RQ+6+24hRea8TvKPP4FrSQ9hsiKAZJCEU8RPrmtlblajqRbCdcl+8Kag0z?= =?us-ascii?Q?UNsZiurO8wg6hNydEiiX2CCPBnwFFaa2HdiQWWDAc0qEB8lRM53b9Cybixn8?= =?us-ascii?Q?3LH+G/Ils5PWp4r4YLzOdBJ36sLAdJEs8oP9f9p0zSH+HwIXiDVh1U9TT8sc?= =?us-ascii?Q?64qnxh59Am7GiEPa3aBisBtBU33P7ty/6ETgOy0+vKubO7eTr1cM/Qq0rtRQ?= =?us-ascii?Q?/5yVLtUE/45oNAuY/1Q19KspSEBSFPy+3sruiJ17+Af+MekAvJ2quiCP/+wu?= =?us-ascii?Q?tnFQrCbr8kvuJZsRuoW7uyyyWdaXIjmcMEJ+KfjjH8PnaPxVmsehcNDtRtnw?= =?us-ascii?Q?wPIM92RbWWU3CkZeT/OJ337g9qdxOuf6Ffj5xyE31rBDM9dWSJAEEw6ZjcQ/?= =?us-ascii?Q?piGTUyV7ykCgA9GLzSj7ornTOE1UJb3BgIGtIjL/ji03NWf+/vjlhacRiw+i?= =?us-ascii?Q?L25LXSCTRXIVbDoM4+uYa1Au9PyhUtbAPvrD7WZS5AVd90gexxNVGWeh+iTU?= =?us-ascii?Q?GK1zSySbz8G9qkDUWH6YpYCsKKUgJ0yrKoZ+FhYsufft8AiRdg4jCbJk10Ti?= =?us-ascii?Q?eNvk1oHTfeEphWOI8vn+8TZviBIYU7/TbHkjuTc61tLrX/tsbANgAmiQWy97?= =?us-ascii?Q?pGceRQh1JGiFliPE/J9MZqglvaK1FYdmlUutnVBm7rfSWtHcuvWhsTD32Fh9?= =?us-ascii?Q?sczWXas72T1Apitlab5h7w1cy2r1Pa8/As978cwFUMQeglcsfQwb4xipKWyJ?= =?us-ascii?Q?LHNhthPzzwzJES0euAZ9Rd/CLbbmV1djyU7/VcGZoSQ8hf9moy2oTBBK+gIJ?= =?us-ascii?Q?Xy63y4GEsyxDTDlb47okXNed71H/7sk/slnAty/XJ4hmCLRDlPckVyf0f7+j?= =?us-ascii?Q?KCVjbNUcg1+xwHWoQpn1seHIniT9Hlm9DhtX8AjQ3Y1T6OdEb9jcgOxiHVIg?= =?us-ascii?Q?uv3YZdD9RYl7pAr5KKkZ9d2/r/f5SNW84vzXCyHQcJS90zjfImKWbkl5ZMC9?= =?us-ascii?Q?zRMGMGQiI9ok0eO/WhfByydRKfyO5+0vC+FPSs+Krt3GxonSuUbZK3bSDZGL?= =?us-ascii?Q?n7DBWVAvMh+5OWCOwob+OX/Ez0O0My/vxHK2+nRkr/3jGvwZNeJvRKR6xszN?= =?us-ascii?Q?rhYMDfh8BOXkiPZmbo2sROCpkINlhv3SEI6YFossnw//lkXXyVTSa3UGK2ii?= =?us-ascii?Q?fYGWKTvWqBDwYeSuInCO42cy+rREeDkpn9HgqI6IupEpm0zDVw=3D=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SJ0PR13MB5545.namprd13.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(1800799024)(52116014)(366016)(376014)(38350700014); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?IeOilEHn1r59VdMWSEYDOM2yq7UodsGTXi9DDihB4bK84IKL/I8DXHUJ1Ew6?= =?us-ascii?Q?mwYBaLZFHWWFNWWBJVjcALWVFMI8fb1HWFTYr/nzCkYUCuLv9yrwQ5rn+cp9?= =?us-ascii?Q?wqgrj/EYzUwnIfzRSqVMpgcRJw9nYxV6HGYBh69iNfR3Ltzn4a2qhCp6rbVZ?= =?us-ascii?Q?CdtBuetodqFK7aXlw5PgY85icu+mmJTgg/ZDCLGVXauoGw8AFwXWRyha3GbS?= =?us-ascii?Q?W9D2hv50dZCcFVbwUsxoYaRUEHteMY6ITheEif2FKzyqHbQPM735mcvT/rVA?= =?us-ascii?Q?Pw5F/AiAboXWsNQs5hRWkTP1GZIRFe9MB/k+qQAbVnXlizOYAl4pZ9A8Xei3?= =?us-ascii?Q?joDnsmCAlJCXqwcXuySBbgnyOMqAvQwVZnrhFnXpVCEOWZLuCl4JswmYlPTz?= =?us-ascii?Q?Om/EKCI/N6csYehWldjulyZin/DWlkyAE9lYPQ7FkXd8LgpKkQqyDbomD0Nn?= =?us-ascii?Q?5b2oy0WhTJd2Jho/ym9SAmRG/kcYsempjFMzV9IvOLhKkcTOeGUQ32yrc/a4?= =?us-ascii?Q?Kc4i7VRl+xsOOi4/1wKqFlmMBsCsBW2oWdOr4YJniK8mh8rtWtFGfJ4w+ljr?= =?us-ascii?Q?R1uyYeNDNnMpVcFfJuo5wHVCtKDhZtRZAdCkNvjFdJQsr1REWlKVKERDRe4l?= =?us-ascii?Q?9IclWq9NTI6Ik1ydfKPIA3yeWPqpu0PSeCFatW1dpQs+ndc4atMpYCElq/mY?= =?us-ascii?Q?zX/A3Swi2OBYIIavEjxxWLc0R3kq6qkhs7BsoY66UYlqIX2MMyRxflxdCdR/?= =?us-ascii?Q?xyzbGJfsWHslk+Cfj8E1tzIn4ZLYvPznAIIw6YFPi6KMtCFvhBc0v3EI31Zo?= =?us-ascii?Q?8Mdb7l3XBmbapN/jvxEERn0n6BUuPU79Y55DdvI3wrU+4YZq+slrCnZXgqD+?= =?us-ascii?Q?P0rV0rnnvdXZt9JCtans8FJ/ulS4wxaOxthufs5R8KdTl96z/wdQYE10kWn5?= =?us-ascii?Q?N1fku3bYPRbcF7K0CTcjqo0OKScTTRALQnjxNNR00lINv31QZgignzLtyoRs?= =?us-ascii?Q?BJnnqcN4aqo6l5LdKIXRznO0l8tTivr1fJ499MHQhKk7u+i97O7HTp4bKhKB?= =?us-ascii?Q?zXg8Ii7eIE+6gOoSqdQlaz3ihYnOxtS5i5fPrLBJVFrr9W1PHprDvD2LFYev?= =?us-ascii?Q?s2jpTARZ7S7/WYlaZJDQs6DM+EK0USA6w4HfE9hkAVeYwu0SKSCb/OAVFomM?= =?us-ascii?Q?RsiFCUlKYjiv6XQk1D5bHeW16/+KMOCTbzJeslR8/hbRlgd5x7lAiOfAo0SH?= =?us-ascii?Q?Jvp3hpTbn6ej/umMd9B/OOjXmKX8j9ZsciwxZ1Q0D6OfOkHyeQXgNK2R5pRQ?= =?us-ascii?Q?UABfi8mcM7BsWMhMq5OP8Pv7D4omrsM85Ey5fB+ByzwMj5SvvciNKdsqh8Ww?= =?us-ascii?Q?cgqEs04rUKDxvtgE6sWE+PaZTE0nZfSYw72nROktyal+/C6IxXQmZKd6bGBq?= =?us-ascii?Q?Rkje4cAB8dltOUxbiGZNyhwvoupaUAF4HbKCB/eFsmPHkWG9kCLALWLJi+uE?= =?us-ascii?Q?x+ieKQW/NGSmQx2Flk/UL5lRmOcUIgF03OwFsWuSOrNs9qb3xzOT4/4V/+51?= =?us-ascii?Q?jAwC54iUEQ0p/Zep6z5pftsf/GikXoWIqh47G/jJlkST7AI4EeEFtTfdDylM?= =?us-ascii?Q?/g=3D=3D?= X-OriginatorOrg: corigine.com X-MS-Exchange-CrossTenant-Network-Message-Id: d4159e9d-c299-4f72-7cb0-08dc9fe8e999 X-MS-Exchange-CrossTenant-AuthSource: SJ0PR13MB5545.namprd13.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jul 2024 07:29:48.3705 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: fe128f2c-073b-4c20-818e-7246a585940c X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 9ZCcvvYj+4+8W/qIVMiNrn29xlBXFzUEtE+VYXoUwJbCIZLYCa2cNRf0C4V2mMkAifBItDH02ut+KidF24yyXTKyagIVSogvXfg3kVSCBSY= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR13MB4958 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Long Wu Use AVX2 instructions to accelerate Rx performance. The acceleration only works on X86 machine. Signed-off-by: Peng Zhang Signed-off-by: Long Wu Reviewed-by: Chaoyong He --- drivers/net/nfp/nfp_ethdev.c | 2 +- drivers/net/nfp/nfp_ethdev_vf.c | 2 +- drivers/net/nfp/nfp_net_meta.c | 1 + drivers/net/nfp/nfp_rxtx.c | 10 ++ drivers/net/nfp/nfp_rxtx.h | 1 + drivers/net/nfp/nfp_rxtx_vec.h | 4 + drivers/net/nfp/nfp_rxtx_vec_avx2.c | 252 ++++++++++++++++++++++++++++ drivers/net/nfp/nfp_rxtx_vec_stub.c | 9 + 8 files changed, 279 insertions(+), 2 deletions(-) diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c index a7b40af712..bd35df2dc9 100644 --- a/drivers/net/nfp/nfp_ethdev.c +++ b/drivers/net/nfp/nfp_ethdev.c @@ -969,7 +969,7 @@ nfp_net_ethdev_ops_mount(struct nfp_net_hw *hw, eth_dev->dev_ops = &nfp_net_eth_dev_ops; eth_dev->rx_queue_count = nfp_net_rx_queue_count; - eth_dev->rx_pkt_burst = &nfp_net_recv_pkts; + nfp_net_recv_pkts_set(eth_dev); } static int diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c index b955624ed6..cdf5da3af7 100644 --- a/drivers/net/nfp/nfp_ethdev_vf.c +++ b/drivers/net/nfp/nfp_ethdev_vf.c @@ -245,7 +245,7 @@ nfp_netvf_ethdev_ops_mount(struct nfp_net_hw *hw, eth_dev->dev_ops = &nfp_netvf_eth_dev_ops; eth_dev->rx_queue_count = nfp_net_rx_queue_count; - eth_dev->rx_pkt_burst = &nfp_net_recv_pkts; + nfp_net_recv_pkts_set(eth_dev); } static int diff --git a/drivers/net/nfp/nfp_net_meta.c b/drivers/net/nfp/nfp_net_meta.c index b31ef56f17..07c6758d33 100644 --- a/drivers/net/nfp/nfp_net_meta.c +++ b/drivers/net/nfp/nfp_net_meta.c @@ -80,6 +80,7 @@ nfp_net_meta_parse_single(uint8_t *meta_base, rte_be32_t meta_header, struct nfp_net_meta_parsed *meta) { + meta->flags = 0; meta->flags |= (1 << NFP_NET_META_HASH); meta->hash_type = rte_be_to_cpu_32(meta_header); meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)(meta_base + 4)); diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c index 1db79ad1cd..4fc3374987 100644 --- a/drivers/net/nfp/nfp_rxtx.c +++ b/drivers/net/nfp/nfp_rxtx.c @@ -17,6 +17,7 @@ #include "nfp_ipsec.h" #include "nfp_logs.h" #include "nfp_net_meta.h" +#include "nfp_rxtx_vec.h" /* * The bit format and map of nfp packet type for rxd.offload_info in Rx descriptor. @@ -867,3 +868,12 @@ nfp_net_tx_queue_info_get(struct rte_eth_dev *dev, info->conf.offloads = dev_info.tx_offload_capa & dev->data->dev_conf.txmode.offloads; } + +void +nfp_net_recv_pkts_set(struct rte_eth_dev *eth_dev) +{ + if (nfp_net_get_avx2_supported()) + eth_dev->rx_pkt_burst = nfp_net_vec_avx2_recv_pkts; + else + eth_dev->rx_pkt_burst = nfp_net_recv_pkts; +} diff --git a/drivers/net/nfp/nfp_rxtx.h b/drivers/net/nfp/nfp_rxtx.h index 3ddf717da0..fff8371991 100644 --- a/drivers/net/nfp/nfp_rxtx.h +++ b/drivers/net/nfp/nfp_rxtx.h @@ -244,5 +244,6 @@ void nfp_net_rx_queue_info_get(struct rte_eth_dev *dev, void nfp_net_tx_queue_info_get(struct rte_eth_dev *dev, uint16_t queue_id, struct rte_eth_txq_info *qinfo); +void nfp_net_recv_pkts_set(struct rte_eth_dev *eth_dev); #endif /* __NFP_RXTX_H__ */ diff --git a/drivers/net/nfp/nfp_rxtx_vec.h b/drivers/net/nfp/nfp_rxtx_vec.h index c92660f963..8720662744 100644 --- a/drivers/net/nfp/nfp_rxtx_vec.h +++ b/drivers/net/nfp/nfp_rxtx_vec.h @@ -10,4 +10,8 @@ bool nfp_net_get_avx2_supported(void); +uint16_t nfp_net_vec_avx2_recv_pkts(void *rx_queue, + struct rte_mbuf **rx_pkts, + uint16_t nb_pkts); + #endif /* __NFP_RXTX_VEC_AVX2_H__ */ diff --git a/drivers/net/nfp/nfp_rxtx_vec_avx2.c b/drivers/net/nfp/nfp_rxtx_vec_avx2.c index 50638e74ab..7c18213624 100644 --- a/drivers/net/nfp/nfp_rxtx_vec_avx2.c +++ b/drivers/net/nfp/nfp_rxtx_vec_avx2.c @@ -5,9 +5,14 @@ #include +#include +#include #include #include +#include "nfp_logs.h" +#include "nfp_net_common.h" +#include "nfp_net_meta.h" #include "nfp_rxtx_vec.h" bool @@ -19,3 +24,250 @@ nfp_net_get_avx2_supported(void) return false; } + +static inline void +nfp_vec_avx2_recv_set_des1(struct nfp_net_rxq *rxq, + struct nfp_net_rx_desc *rxds, + struct rte_mbuf *rxb) +{ + __m128i dma; + __m128i dma_hi; + __m128i vaddr0; + __m128i hdr_room = _mm_set_epi64x(0, RTE_PKTMBUF_HEADROOM); + + dma = _mm_add_epi64(_mm_loadu_si128((__m128i *)&rxb->buf_addr), hdr_room); + dma_hi = _mm_srli_epi64(dma, 32); + vaddr0 = _mm_unpacklo_epi32(dma_hi, dma); + + _mm_storel_epi64((void *)rxds, vaddr0); + + rxq->rd_p = (rxq->rd_p + 1) & (rxq->rx_count - 1); +} + +static inline void +nfp_vec_avx2_recv_set_des4(struct nfp_net_rxq *rxq, + struct nfp_net_rx_desc *rxds, + struct rte_mbuf **rxb) +{ + __m128i dma; + __m128i dma_hi; + __m128i vaddr0; + __m128i vaddr1; + __m128i vaddr2; + __m128i vaddr3; + __m128i vaddr0_1; + __m128i vaddr2_3; + __m256i vaddr0_3; + __m128i hdr_room = _mm_set_epi64x(0, RTE_PKTMBUF_HEADROOM); + + dma = _mm_add_epi64(_mm_loadu_si128((__m128i *)&rxb[0]->buf_addr), hdr_room); + dma_hi = _mm_srli_epi64(dma, 32); + vaddr0 = _mm_unpacklo_epi32(dma_hi, dma); + + dma = _mm_add_epi64(_mm_loadu_si128((__m128i *)&rxb[1]->buf_addr), hdr_room); + dma_hi = _mm_srli_epi64(dma, 32); + vaddr1 = _mm_unpacklo_epi32(dma_hi, dma); + + dma = _mm_add_epi64(_mm_loadu_si128((__m128i *)&rxb[2]->buf_addr), hdr_room); + dma_hi = _mm_srli_epi64(dma, 32); + vaddr2 = _mm_unpacklo_epi32(dma_hi, dma); + + dma = _mm_add_epi64(_mm_loadu_si128((__m128i *)&rxb[3]->buf_addr), hdr_room); + dma_hi = _mm_srli_epi64(dma, 32); + vaddr3 = _mm_unpacklo_epi32(dma_hi, dma); + + vaddr0_1 = _mm_unpacklo_epi64(vaddr0, vaddr1); + vaddr2_3 = _mm_unpacklo_epi64(vaddr2, vaddr3); + + vaddr0_3 = _mm256_inserti128_si256(_mm256_castsi128_si256(vaddr0_1), + vaddr2_3, 1); + + _mm256_store_si256((void *)rxds, vaddr0_3); + + rxq->rd_p = (rxq->rd_p + 4) & (rxq->rx_count - 1); +} + +static inline void +nfp_vec_avx2_recv_set_rxpkt1(struct nfp_net_rxq *rxq, + struct nfp_net_rx_desc *rxds, + struct rte_mbuf *rx_pkt) +{ + struct nfp_net_hw *hw = rxq->hw; + struct nfp_net_meta_parsed meta; + + rx_pkt->data_len = rxds->rxd.data_len - NFP_DESC_META_LEN(rxds); + /* Size of the whole packet. We just support 1 segment */ + rx_pkt->pkt_len = rxds->rxd.data_len - NFP_DESC_META_LEN(rxds); + + /* Filling the received mbuf with packet info */ + if (hw->rx_offset) + rx_pkt->data_off = RTE_PKTMBUF_HEADROOM + hw->rx_offset; + else + rx_pkt->data_off = RTE_PKTMBUF_HEADROOM + NFP_DESC_META_LEN(rxds); + + rx_pkt->port = rxq->port_id; + rx_pkt->nb_segs = 1; + rx_pkt->next = NULL; + + nfp_net_meta_parse(rxds, rxq, hw, rx_pkt, &meta); + + /* Checking the checksum flag */ + nfp_net_rx_cksum(rxq, rxds, rx_pkt); +} + +static inline void +nfp_vec_avx2_recv1(struct nfp_net_rxq *rxq, + struct nfp_net_rx_desc *rxds, + struct rte_mbuf *rxb, + struct rte_mbuf *rx_pkt) +{ + nfp_vec_avx2_recv_set_rxpkt1(rxq, rxds, rx_pkt); + + nfp_vec_avx2_recv_set_des1(rxq, rxds, rxb); +} + +static inline void +nfp_vec_avx2_recv4(struct nfp_net_rxq *rxq, + struct nfp_net_rx_desc *rxds, + struct rte_mbuf **rxb, + struct rte_mbuf **rx_pkts) +{ + nfp_vec_avx2_recv_set_rxpkt1(rxq, rxds, rx_pkts[0]); + nfp_vec_avx2_recv_set_rxpkt1(rxq, rxds + 1, rx_pkts[1]); + nfp_vec_avx2_recv_set_rxpkt1(rxq, rxds + 2, rx_pkts[2]); + nfp_vec_avx2_recv_set_rxpkt1(rxq, rxds + 3, rx_pkts[3]); + + nfp_vec_avx2_recv_set_des4(rxq, rxds, rxb); +} + +static inline bool +nfp_vec_avx2_recv_check_packets4(struct nfp_net_rx_desc *rxds) +{ + __m256i data = _mm256_loadu_si256((void *)rxds); + + if ((_mm256_extract_epi8(data, 3) & PCIE_DESC_RX_DD) == 0 || + (_mm256_extract_epi8(data, 11) & PCIE_DESC_RX_DD) == 0 || + (_mm256_extract_epi8(data, 19) & PCIE_DESC_RX_DD) == 0 || + (_mm256_extract_epi8(data, 27) & PCIE_DESC_RX_DD) == 0) + return false; + + return true; +} + +uint16_t +nfp_net_vec_avx2_recv_pkts(void *rx_queue, + struct rte_mbuf **rx_pkts, + uint16_t nb_pkts) +{ + uint16_t avail; + uint16_t nb_hold; + bool burst_receive; + struct rte_mbuf **rxb; + struct nfp_net_rx_desc *rxds; + struct nfp_net_rxq *rxq = rx_queue; + + if (unlikely(rxq == NULL)) { + PMD_RX_LOG(ERR, "RX Bad queue"); + return 0; + } + + avail = 0; + nb_hold = 0; + burst_receive = true; + while (avail < nb_pkts) { + rxds = &rxq->rxds[rxq->rd_p]; + rxb = &rxq->rxbufs[rxq->rd_p].mbuf; + + if ((_mm_extract_epi8(_mm_loadu_si128((void *)(rxds)), 3) + & PCIE_DESC_RX_DD) == 0) + goto recv_end; + + rte_prefetch0(rxq->rxbufs[rxq->rd_p].mbuf); + + if ((rxq->rd_p & 0x3) == 0) { + rte_prefetch0(&rxq->rxds[rxq->rd_p]); + rte_prefetch0(&rxq->rxbufs[rxq->rd_p]); + rte_prefetch0(rxq->rxbufs[rxq->rd_p + 1].mbuf); + rte_prefetch0(rxq->rxbufs[rxq->rd_p + 2].mbuf); + rte_prefetch0(rxq->rxbufs[rxq->rd_p + 3].mbuf); + } + + if ((rxq->rd_p & 0x7) == 0) { + rte_prefetch0(rxq->rxbufs[rxq->rd_p + 4].mbuf); + rte_prefetch0(rxq->rxbufs[rxq->rd_p + 5].mbuf); + rte_prefetch0(rxq->rxbufs[rxq->rd_p + 6].mbuf); + rte_prefetch0(rxq->rxbufs[rxq->rd_p + 7].mbuf); + } + + /* + * If can not receive burst, just receive one. + * 1. Rx ring will coming to the tail. + * 2. Do not need to receive 4 packets. + * 3. If pointer address unaligned on 32-bit boundary. + * 4. Rx ring does not have 4 packets or alloc 4 mbufs failed. + */ + if ((rxq->rx_count - rxq->rd_p) < 4 || + (nb_pkts - avail) < 4 || + ((uintptr_t)rxds & 0x1F) != 0 || + !burst_receive) { + _mm_storel_epi64((void *)&rx_pkts[avail], + _mm_loadu_si128((void *)rxb)); + + /* Allocate a new mbuf into the software ring. */ + if (rte_pktmbuf_alloc_bulk(rxq->mem_pool, rxb, 1) < 0) { + PMD_RX_LOG(DEBUG, "RX mbuf alloc failed port_id=%u queue_id=%hu", + rxq->port_id, rxq->qidx); + nfp_net_mbuf_alloc_failed(rxq); + goto recv_end; + } + + nfp_vec_avx2_recv1(rxq, rxds, *rxb, rx_pkts[avail]); + + avail++; + nb_hold++; + continue; + } + + burst_receive = nfp_vec_avx2_recv_check_packets4(rxds); + if (!burst_receive) + continue; + + _mm256_storeu_si256((void *)&rx_pkts[avail], + _mm256_loadu_si256((void *)rxb)); + + /* Allocate 4 new mbufs into the software ring. */ + if (rte_pktmbuf_alloc_bulk(rxq->mem_pool, rxb, 4) < 0) { + burst_receive = false; + continue; + } + + nfp_vec_avx2_recv4(rxq, rxds, rxb, &rx_pkts[avail]); + + avail += 4; + nb_hold += 4; + } + +recv_end: + if (nb_hold == 0) + return nb_hold; + + PMD_RX_LOG(DEBUG, "RX port_id=%u queue_id=%u, %d packets received", + rxq->port_id, (unsigned int)rxq->qidx, nb_hold); + + nb_hold += rxq->nb_rx_hold; + + /* + * FL descriptors needs to be written before incrementing the + * FL queue WR pointer + */ + rte_wmb(); + if (nb_hold > rxq->rx_free_thresh) { + PMD_RX_LOG(DEBUG, "port=%hu queue=%hu nb_hold=%hu avail=%hu", + rxq->port_id, rxq->qidx, nb_hold, avail); + nfp_qcp_ptr_add(rxq->qcp_fl, NFP_QCP_WRITE_PTR, nb_hold); + nb_hold = 0; + } + rxq->nb_rx_hold = nb_hold; + + return avail; +} diff --git a/drivers/net/nfp/nfp_rxtx_vec_stub.c b/drivers/net/nfp/nfp_rxtx_vec_stub.c index 1bc55b67e0..c480f61ef0 100644 --- a/drivers/net/nfp/nfp_rxtx_vec_stub.c +++ b/drivers/net/nfp/nfp_rxtx_vec_stub.c @@ -6,6 +6,7 @@ #include #include +#include #include "nfp_rxtx_vec.h" @@ -14,3 +15,11 @@ nfp_net_get_avx2_supported(void) { return false; } + +uint16_t __rte_weak +nfp_net_vec_avx2_recv_pkts(__rte_unused void *rx_queue, + __rte_unused struct rte_mbuf **rx_pkts, + __rte_unused uint16_t nb_pkts) +{ + return 0; +} -- 2.39.1