From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8F537455DF; Tue, 9 Jul 2024 10:24:48 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 04BE64329C; Tue, 9 Jul 2024 10:24:31 +0200 (CEST) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2104.outbound.protection.outlook.com [40.107.237.104]) by mails.dpdk.org (Postfix) with ESMTP id 0F07943281 for ; Tue, 9 Jul 2024 10:24:29 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=U5VdE0L5K0OQlPAYuws2W7eASnl3GevNwLZ1VqRja8cLkEzZ13IfakIqQB3Yr3W1+V7fYG7KMbJvYX5066t3KsGihp4IbC+oK7a5g4w1hA7tvi9qzU9180GKsyDU4PtULcoIw//ZgHofK2JGTcIvfpkMmGAX+HXfl0kP1Xtr4Y05PCb3+Dg7SElsoho2wOwSvQs87rMaL0Pm4NCzhDMEoVzh+lZbhxac0MqmDrbGWjH3PB9jc5QSnXoL5r2QWgCylbMB+uxVbl2z4eXOywvo5/Wkgg+fgOwWruEK4LQrHo3zlhd32QELcSPFS8LKD0CfJKcAyKkd1gTiEr2rdNUJdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=e0TL/kLR076LyJ3zmzevsib7AiACIt+FAyJ8jcIlX9A=; b=iJiVLGWjHF1KKcPRO/XLdfjYOAgZuOlL+6oC7ucZsaYQKCJiW1pTtwD1+Nl9iwsHrj1IBfskGdStG49mq462ZvK0s6PESbfxMGryhBBLsvcP6Fb/gItWZMHNxyTK7H/TsKCnd2mfDhvSRlfOPoBeXxkNlZ00jdmuHD9+yfXth+DuRUmlCypjnfe6Yx5eRPnRMYwX+jXQwfK5o2Cs3Cf2if8v+ijvurzOS+zcG/p6GwKzjYEiWc88l2i+2gNeOFq817ChM/098U7KS8Le2cEUP3GulOSjSwXONaBJM2eLdYAb17HcBCy9K+pNT2lYsE44jVd3ZHl3ohRIUOEjE3ruKQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=corigine.com; dmarc=pass action=none header.from=corigine.com; dkim=pass header.d=corigine.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=corigine.onmicrosoft.com; s=selector2-corigine-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=e0TL/kLR076LyJ3zmzevsib7AiACIt+FAyJ8jcIlX9A=; b=VQc/hV8xv3HXLDRIfT0Ri09ftrOqs4nJFNzOcr8YpZBw/V+zI2w5Y/yWn6w07BtZXi/oomn0jB5zZF+3ByqVF7XEZYEX9Y/Ffj07ICBxt5W1xbDwjbxqVnedNyZVfiPO0Ljgabk3wfdkMRopzcSc8fOPO4xovK7ZYJVXBQ2gipM= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=corigine.com; Received: from SJ0PR13MB5545.namprd13.prod.outlook.com (2603:10b6:a03:424::5) by PH7PR13MB6194.namprd13.prod.outlook.com (2603:10b6:510:245::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.36; Tue, 9 Jul 2024 08:24:27 +0000 Received: from SJ0PR13MB5545.namprd13.prod.outlook.com ([fe80::b900:5f05:766f:833]) by SJ0PR13MB5545.namprd13.prod.outlook.com ([fe80::b900:5f05:766f:833%4]) with mapi id 15.20.7741.033; Tue, 9 Jul 2024 08:24:27 +0000 From: Chaoyong He To: dev@dpdk.org Cc: oss-drivers@corigine.com, Long Wu , Chaoyong He Subject: [PATCH v4 3/5] net/nfp: support AVX2 Tx function Date: Tue, 9 Jul 2024 16:24:03 +0800 Message-Id: <20240709082405.248641-4-chaoyong.he@corigine.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20240709082405.248641-1-chaoyong.he@corigine.com> References: <20240709072921.246520-1-chaoyong.he@corigine.com> <20240709082405.248641-1-chaoyong.he@corigine.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: BYAPR05CA0064.namprd05.prod.outlook.com (2603:10b6:a03:74::41) To SJ0PR13MB5545.namprd13.prod.outlook.com (2603:10b6:a03:424::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ0PR13MB5545:EE_|PH7PR13MB6194:EE_ X-MS-Office365-Filtering-Correlation-Id: 8f64be1d-b964-4b28-3b46-08dc9ff08bf7 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|376014|52116014|1800799024|366016|38350700014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?LwMd7ineb1UAfwjp9ynddD7cTFScQSM6SRQMnqPS1pVISEGqwjzC3FGypRFj?= =?us-ascii?Q?s5Mw3n+5QrXohp1I/0EAr6Uw5nFgccLGRQ8Sql1j8jft5ssZNt/l01Cq3LFR?= =?us-ascii?Q?19kxpSPsO5Ns1SU3jjxmd1QjxcrtuIU86IdkboVjvRP6ZUx5VzLTnMGS4i/i?= =?us-ascii?Q?LrKtyAfTfScUBuOBqbxC5HlMKWyqxA/qLTbKyLb/ZbjoKHcfq62pmrZnpxAc?= =?us-ascii?Q?STMgBSrYENFWlBib6AAu0dmAlEU2xxAQHaLmw+BIwAif5Ttcfk3hO9rktEII?= =?us-ascii?Q?NOJO0njCB11H/gtcwhWfRABjdBoTFrzbEVC6ApDuiL6Qm9nXEDV1N8mh9uUg?= =?us-ascii?Q?Wl+fhxhlYIJtE2Z39Fu8k4+Q/M/EqeGMBNB1SyWKLcsobvOugcmGglGBwzuo?= =?us-ascii?Q?9NPwcBhCHrkUaJVurmNixT59UK7pQ0eYmcc510wtTI0Y2mT3DREAf8sQOGXT?= =?us-ascii?Q?saZYsmM0AT/H17DYb+U8ScFJ3AytqJWPPNzIjbAmTOpFHzQ3Y2t2Ds0p/zoU?= =?us-ascii?Q?8AT0FyntX1LvmFURA8H4FOARAW0LtTzFSDM1RvqJR1iZhHTnUhj/YIWc80Q3?= =?us-ascii?Q?FVJmTOKm8Wximn7Bp7km8dKfH7dsXHa81tdDgcHMdfOfnZqcN0PvanLM+KfH?= =?us-ascii?Q?JfU4XZT/TGWvIj6bMCsx1n7b/OSlWOx94coQ4syBsylZ+q4TBwJaS/Fyvjca?= =?us-ascii?Q?KUWfujebEG2sMFA6Ek2xXPyqIbWVRLghxHr8u9qkHMrAUKxiuP1vS20C5FK8?= =?us-ascii?Q?qxCCfZ9ZW73/hgoMiZrSASmrm3fUjh9HNcvexL/fj73PaqpBDLlRyiHs1S4L?= =?us-ascii?Q?El04r359o41OlCG5FiWyQj87dDBWMH3qSRYmJIHC4nUiWbNVq1FU4rDlVZ/v?= =?us-ascii?Q?pVwlbHHY+/KywNWe5pAL36akASporrjBXP8NKsxwAmRsJ3HZHhVy+hUgqUmX?= =?us-ascii?Q?bQZ4+iO5Jr25mbG6PBpKceFGn/bX6ta3PKDTYMk6+k6JNaQlsJJ57CLg1xlo?= =?us-ascii?Q?trF1SXYqBxpgKckUjWm8q0Leg3WrXjkYFYUlNpKFD71RKu2ci/IZy+OHVCMe?= =?us-ascii?Q?7BLWa6TVn7U9eGnlWTU/lScAdrzpcWiyip1U4NTlSYL/Csd+nhhFO6dlIJ17?= =?us-ascii?Q?tV4+A/aNSAmeaouqrMYgVJxBDmW30g9XwZlO/PZlv5tzPiaTnY/dBAMjSmkj?= =?us-ascii?Q?bws6GAi20c2AijgzUhmJvcN09hVEg6t6ygSuQWlYTt6OknoDGyaM1rbP3EvG?= =?us-ascii?Q?7GEO2OLrQYKic2/HHuFCp6o2IryGn39vcLL9Df58ZzG3sWsAlaUJdC0xdDcv?= =?us-ascii?Q?VvtNruImlz/UlLUzzHQU+tYZB8bkAfGe743rkL4eS2EXJrqZd9WsZx5h+mlV?= =?us-ascii?Q?wfqhkJJzxISZTRIGOsJqSqSSmHH1P0jYFFNJvQeXqDez+8SuVQ=3D=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SJ0PR13MB5545.namprd13.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(52116014)(1800799024)(366016)(38350700014); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?xQqfAlixm/0Hx7tYzFT4fOEPG3X17OFDqr4ohhLwrxy5yjc01qeMu/uHPJv5?= =?us-ascii?Q?rjSLEy92u00E9kvSS6WFaAAmojRN1cr8FLmyru9TbdGm5APKeXk9smj2LrXH?= =?us-ascii?Q?pRMKm9ZT9BZY1jxy3I4IAxIL6NPbfk6GENP0lhgo7jh62kDr8u6U11XtOGKM?= =?us-ascii?Q?JFw01EKeJtGGkvMjZ3rr+G7o7JSHRoYc6rs/CalmtkSxWhHwtktHsINOipDp?= =?us-ascii?Q?8617jsfFlzMnOZwkiuDjih8kG9+uTuugXPwZOQvKkn1oNKxEMBWst4N/Z+m/?= =?us-ascii?Q?mZo7YrmGNewn+6zamduH/BGrbPGe7oG+w3OrVmscribhh/pKpLWBXG8+8Nl+?= =?us-ascii?Q?h3WLatoHCwvJ5cRayFzUAR0vHUQCbL9Juj3xZFtGhh5u6GDIegr8cV+ChfU9?= =?us-ascii?Q?7Vkg/SlrDYPaggWGhjkZKWfD0bbviGlOxurxa0769AqjkXzEWKjQ7rEr7Bhx?= =?us-ascii?Q?0zrzKkrCTwAp40mJtatqz93xJKdQDCBD60c129UbYh1amyBieG/KOT6FvbRI?= =?us-ascii?Q?KnOr133hheiHxSrk084/l7Hc4y3cQkPRB3mcZVyVYKW1eKmq9kAVB9Qty1i0?= =?us-ascii?Q?g9LuSInrLtnZc18ESJaeTkANQKWB/QdQpM/U+4XitEmxF+WjjZinI1G+5anH?= =?us-ascii?Q?WZVMlplhPEgG44bwnvKbqqea/dXxIRjEJlgLSgD13ACwvJPtQU8jqJ7NDZjO?= =?us-ascii?Q?5qE6MvCnOwyYN7ZDRwk+sli/+WfWKVi+6BNMf2t7j8pO+UC/77vwIrd8A3Ky?= =?us-ascii?Q?Uqe02s06PmFesFh5fklQJ64CNrRBWEKeo+aZw6nUZE1jdY4qSrZpLPyhxjm+?= =?us-ascii?Q?D/wrGtF8T412zVD9ffsbSY2GXeS6k9xufSv/RxiZJUXPGrKwU3PwWZbuslLF?= =?us-ascii?Q?d+TvmQ836oHhzYobG/dQgQTI+JGuMIL/A9JhDc2r2OHf8ZxYAN6mTob4bMBo?= =?us-ascii?Q?S5tm94UzxsDOXBhaKRPPNYmQ2+0sLqhlQFNXSCAXB8MNNLGxnLuuh8bihhK9?= =?us-ascii?Q?fhoP8xduBOYu8vKVdeUCgqzPT5n3A7BcOBoi2W1pzaXPfTcQnHxH9VyOqu0H?= =?us-ascii?Q?4IlHXUhE/C6Sk8XiALdh2cjBr9oNUCqXbSBfDioHj7YFGw0DlfNmXCr1OnsB?= =?us-ascii?Q?b1f7EqGE6tLtMaGHI/hmk8zRhmVbQa7ORmhXyLAYr99STfUvRGc8tX/TclZ3?= =?us-ascii?Q?05ZGXz49YbFZBqlLUQZBi2gRPqT7nDTjkcyNMMkX5pY0tOK7RdVhGO6nT9Q2?= =?us-ascii?Q?KqtRbMrYUElcjB9vBv1rtlgE31AUKfgvNUIzlDV1F2rryC6DPRt8kTQ4Zwpl?= =?us-ascii?Q?uwuwlHrtRUBsNh+kh9z363wEXNFrD13Ei+zntIwJfsqhTekYRCm9/jz8PyJN?= =?us-ascii?Q?x4eYXzPYaFFwvsa7LiuNBmZzMfYN1kvmFj3NtYkqpRaiITBDL+YN4yazE3rY?= =?us-ascii?Q?ipvLFvEEr5PLDRoTanF9qIvU+b/4W/zDGsK+rQzED6NxY7Xyjvje6c6ebRob?= =?us-ascii?Q?t2bYNcjVYiEwPdFaRZER9us5mOBWvryc+sDeI2Z492U6h/NoIzEDMXnAovMa?= =?us-ascii?Q?Iv0y/q5oLG7lqJFW0lSGGtXRaPAiWwiOzZTWJZ34X4D7NHN/8uI/bonabgwi?= =?us-ascii?Q?qQ=3D=3D?= X-OriginatorOrg: corigine.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8f64be1d-b964-4b28-3b46-08dc9ff08bf7 X-MS-Exchange-CrossTenant-AuthSource: SJ0PR13MB5545.namprd13.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jul 2024 08:24:27.2257 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: fe128f2c-073b-4c20-818e-7246a585940c X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: tPnCeZ3cGNHqXu6TNypZ4BdkYg2JVuBmopi3AFggKnbeXl53FUHmTN6Lt15lp/SWGk7v9FutF9BkJmsWjhQ95Oi9zfzwGgV+qzxlzXxsNDA= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR13MB6194 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Long Wu Use AVX2 instructions to accelerate Tx performance. The acceleration only works on X86 machine. Signed-off-by: Long Wu Reviewed-by: Chaoyong He --- drivers/net/nfp/meson.build | 20 + drivers/net/nfp/nfdk/nfp_nfdk.h | 1 + drivers/net/nfp/nfdk/nfp_nfdk_dp.c | 12 + drivers/net/nfp/nfdk/nfp_nfdk_vec.h | 36 ++ drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c | 432 ++++++++++++++++++++ drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c | 14 + drivers/net/nfp/nfp_ethdev.c | 3 +- drivers/net/nfp/nfp_ethdev_vf.c | 3 +- drivers/net/nfp/nfp_rxtx.h | 5 +- drivers/net/nfp/nfp_rxtx_vec.h | 13 + drivers/net/nfp/nfp_rxtx_vec_avx2.c | 21 + drivers/net/nfp/nfp_rxtx_vec_stub.c | 16 + 12 files changed, 573 insertions(+), 3 deletions(-) create mode 100644 drivers/net/nfp/nfdk/nfp_nfdk_vec.h create mode 100644 drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c create mode 100644 drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c create mode 100644 drivers/net/nfp/nfp_rxtx_vec.h create mode 100644 drivers/net/nfp/nfp_rxtx_vec_avx2.c create mode 100644 drivers/net/nfp/nfp_rxtx_vec_stub.c diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build index 7216c8dff9..58a066c2e3 100644 --- a/drivers/net/nfp/meson.build +++ b/drivers/net/nfp/meson.build @@ -17,6 +17,7 @@ sources = files( 'flower/nfp_flower_service.c', 'nfd3/nfp_nfd3_dp.c', 'nfdk/nfp_nfdk_dp.c', + 'nfdk/nfp_nfdk_vec_stub.c', 'nfpcore/nfp_cppcore.c', 'nfpcore/nfp_crc.c', 'nfpcore/nfp_elf.c', @@ -44,7 +45,26 @@ sources = files( 'nfp_net_flow.c', 'nfp_net_meta.c', 'nfp_rxtx.c', + 'nfp_rxtx_vec_stub.c', 'nfp_service.c', ) +if arch_subdir == 'x86' + includes += include_directories('../../common/nfp') + + avx2_sources = files( + 'nfdk/nfp_nfdk_vec_avx2_dp.c', + 'nfp_rxtx_vec_avx2.c', + ) + + nfp_avx2_lib = static_library('nfp_avx2_lib', + avx2_sources, + dependencies: [static_rte_ethdev, static_rte_bus_pci], + include_directories: includes, + c_args: [cflags, '-mavx2'] + ) + + objs += nfp_avx2_lib.extract_all_objects(recursive: true) +endif + deps += ['hash', 'security', 'common_nfp'] diff --git a/drivers/net/nfp/nfdk/nfp_nfdk.h b/drivers/net/nfp/nfdk/nfp_nfdk.h index 89a98d13f3..29d862f6f0 100644 --- a/drivers/net/nfp/nfdk/nfp_nfdk.h +++ b/drivers/net/nfp/nfdk/nfp_nfdk.h @@ -222,5 +222,6 @@ int nfp_net_nfdk_tx_maybe_close_block(struct nfp_net_txq *txq, int nfp_net_nfdk_set_meta_data(struct rte_mbuf *pkt, struct nfp_net_txq *txq, uint64_t *metadata); +void nfp_net_nfdk_xmit_pkts_set(struct rte_eth_dev *eth_dev); #endif /* __NFP_NFDK_H__ */ diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c index 173aabf0b9..2cea5688b3 100644 --- a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c +++ b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c @@ -11,6 +11,8 @@ #include "../flower/nfp_flower.h" #include "../nfp_logs.h" #include "../nfp_net_meta.h" +#include "../nfp_rxtx_vec.h" +#include "nfp_nfdk_vec.h" #define NFDK_TX_DESC_GATHER_MAX 17 @@ -511,6 +513,7 @@ nfp_net_nfdk_tx_queue_setup(struct rte_eth_dev *dev, dev->data->tx_queues[queue_idx] = txq; txq->hw = hw; txq->hw_priv = dev->process_private; + txq->simple_always = true; /* * Telling the HW about the physical address of the TX ring and number @@ -521,3 +524,12 @@ nfp_net_nfdk_tx_queue_setup(struct rte_eth_dev *dev, return 0; } + +void +nfp_net_nfdk_xmit_pkts_set(struct rte_eth_dev *eth_dev) +{ + if (nfp_net_get_avx2_supported()) + eth_dev->tx_pkt_burst = nfp_net_nfdk_vec_avx2_xmit_pkts; + else + eth_dev->tx_pkt_burst = nfp_net_nfdk_xmit_pkts; +} diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_vec.h b/drivers/net/nfp/nfdk/nfp_nfdk_vec.h new file mode 100644 index 0000000000..14319d6cf6 --- /dev/null +++ b/drivers/net/nfp/nfdk/nfp_nfdk_vec.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#ifndef __NFP_NFDK_VEC_H__ +#define __NFP_NFDK_VEC_H__ + +#include + +#include + +#include "../nfp_net_common.h" +#include "nfp_nfdk.h" + +static inline bool +nfp_net_nfdk_is_simple_packet(struct rte_mbuf *pkt, + struct nfp_net_hw *hw) +{ + if (pkt->data_len > NFDK_TX_MAX_DATA_PER_HEAD) + return false; + + if ((hw->super.cap & NFP_NET_CFG_CTRL_LSO_ANY) == 0) + return true; + + if ((pkt->ol_flags & RTE_MBUF_F_TX_TCP_SEG) == 0) + return true; + + return false; +} + +uint16_t nfp_net_nfdk_vec_avx2_xmit_pkts(void *tx_queue, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts); + +#endif /* __NFP_NFDK_VEC_H__ */ diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c b/drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c new file mode 100644 index 0000000000..6d1359fdb1 --- /dev/null +++ b/drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c @@ -0,0 +1,432 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include +#include +#include + +#include "../nfp_logs.h" +#include "nfp_nfdk.h" +#include "nfp_nfdk_vec.h" + +/* + * One simple packet needs 2 descriptors so if send 4 packets driver will use + * 8 descriptors at once. + */ +#define NFDK_SIMPLE_BURST_DES_NUM 8 + +#define NFDK_SIMPLE_DES_TYPE (NFDK_DESC_TX_EOP | \ + (NFDK_DESC_TX_TYPE_HEAD & (NFDK_DESC_TX_TYPE_SIMPLE << 12))) + +static inline int +nfp_net_nfdk_vec_avx2_xmit_simple_set_des2(struct rte_mbuf *pkt, + struct nfp_net_txq *txq, + uint64_t *des_addr, + uint64_t *des_meta, + bool repr_flag) +{ + int ret; + __m128i dma_addr; + __m128i dma_hi; + __m128i data_off; + __m128i dlen_type; + uint64_t metadata; + + if (repr_flag) { + metadata = NFDK_DESC_TX_CHAIN_META; + } else { + ret = nfp_net_nfdk_set_meta_data(pkt, txq, &metadata); + if (unlikely(ret != 0)) + return ret; + } + + data_off = _mm_set_epi64x(0, pkt->data_off); + dma_addr = _mm_add_epi64(_mm_loadu_si128((__m128i *)&pkt->buf_addr), data_off); + dma_hi = _mm_srli_epi64(dma_addr, 32); + + dlen_type = _mm_set_epi64x(0, (pkt->data_len - 1) | NFDK_SIMPLE_DES_TYPE); + + *des_addr = _mm_extract_epi64(_mm_add_epi64(_mm_unpacklo_epi32(dma_hi, dma_addr), + _mm_slli_epi64(dlen_type, 16)), 0); + + *des_meta = nfp_net_nfdk_tx_cksum(txq, pkt, metadata); + + return 0; +} + +static inline int +nfp_net_nfdk_vec_avx2_xmit_simple_send1(struct nfp_net_txq *txq, + struct nfp_net_nfdk_tx_desc *txds, + struct rte_mbuf *pkt, + bool repr_flag) +{ + int ret; + __m128i des_data; + uint64_t des_addr; + uint64_t des_meta; + + ret = nfp_net_nfdk_vec_avx2_xmit_simple_set_des2(pkt, txq, &des_addr, + &des_meta, repr_flag); + if (unlikely(ret != 0)) + return ret; + + txq->wr_p = D_IDX(txq, txq->wr_p + NFDK_TX_DESC_PER_SIMPLE_PKT); + if ((txq->wr_p & (NFDK_TX_DESC_BLOCK_CNT - 1)) != 0) + txq->data_pending += pkt->data_len; + else + txq->data_pending = 0; + + des_data = _mm_set_epi64x(des_meta, des_addr); + + _mm_store_si128((void *)txds, des_data); + + return 0; +} + +static inline int +nfp_vec_avx2_nfdk_xmit_simple_send4(struct nfp_net_txq *txq, + struct nfp_net_nfdk_tx_desc *txds, + struct rte_mbuf **pkt, + bool repr_flag) +{ + int ret; + uint16_t i; + __m256i des_data0_1; + __m256i des_data2_3; + uint64_t des_addr[4]; + uint64_t des_meta[4]; + + for (i = 0; i < 4; i++) { + ret = nfp_net_nfdk_vec_avx2_xmit_simple_set_des2(pkt[i], txq, + &des_addr[i], &des_meta[i], repr_flag); + if (unlikely(ret != 0)) + return ret; + } + + for (i = 0; i < 4; i++) { + txq->wr_p = D_IDX(txq, txq->wr_p + NFDK_TX_DESC_PER_SIMPLE_PKT); + if ((txq->wr_p & (NFDK_TX_DESC_BLOCK_CNT - 1)) != 0) + txq->data_pending += pkt[i]->data_len; + else + txq->data_pending = 0; + } + + des_data0_1 = _mm256_set_epi64x(des_meta[1], des_addr[1], des_meta[0], des_addr[0]); + des_data2_3 = _mm256_set_epi64x(des_meta[3], des_addr[3], des_meta[2], des_addr[2]); + + _mm256_store_si256((void *)txds, des_data0_1); + _mm256_store_si256((void *)(txds + 4), des_data2_3); + + return 0; +} + +static inline void +nfp_net_nfdk_vec_avx2_xmit_mbuf_store4(struct rte_mbuf **mbuf, + struct rte_mbuf **tx_pkts) +{ + __m256i mbuf_room0_1; + __m256i mbuf_room2_3; + + mbuf_room0_1 = _mm256_set_epi64x(0, (uintptr_t)tx_pkts[1], 0, + (uintptr_t)tx_pkts[0]); + mbuf_room2_3 = _mm256_set_epi64x(0, (uintptr_t)tx_pkts[3], 0, + (uintptr_t)tx_pkts[2]); + + _mm256_store_si256((void *)mbuf, mbuf_room0_1); + _mm256_store_si256((void *)(mbuf + 4), mbuf_room2_3); +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_simple_pkts(struct nfp_net_txq *txq, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts, + uint16_t simple_close, + bool repr_flag) +{ + int ret; + uint16_t npkts = 0; + uint16_t need_txds; + uint16_t free_descs; + struct rte_mbuf **lmbuf; + struct nfp_net_nfdk_tx_desc *ktxds; + + PMD_TX_LOG(DEBUG, "Working for queue %hu at pos %u and %hu packets", + txq->qidx, txq->wr_p, nb_pkts); + + need_txds = nb_pkts << 1; + if (nfp_net_nfdk_free_tx_desc(txq) < need_txds || nfp_net_nfdk_txq_full(txq)) + nfp_net_tx_free_bufs(txq); + + free_descs = nfp_net_nfdk_free_tx_desc(txq); + if (unlikely(free_descs < NFDK_TX_DESC_PER_SIMPLE_PKT)) { + if (unlikely(simple_close > 0)) + goto xmit_end; + + return 0; + } + + PMD_TX_LOG(DEBUG, "Queue: %hu. Sending %hu packets", txq->qidx, nb_pkts); + + /* Sending packets */ + while (npkts < nb_pkts && free_descs >= NFDK_TX_DESC_PER_SIMPLE_PKT) { + ktxds = &txq->ktxds[txq->wr_p]; + lmbuf = &txq->txbufs[txq->wr_p].mbuf; + + /* + * If can not send burst, just send one. + * 1. Tx ring will come to the tail. + * 2. Do not need to send 4 packets. + * 3. If pointer address unaligned on 32-bit boundary. + * 4. If free descriptors are not enough. + */ + if ((txq->tx_count - txq->wr_p) < NFDK_SIMPLE_BURST_DES_NUM || + (nb_pkts - npkts) < 4 || + ((uintptr_t)ktxds & 0x1F) != 0 || + free_descs < NFDK_SIMPLE_BURST_DES_NUM) { + ret = nfp_net_nfdk_vec_avx2_xmit_simple_send1(txq, + ktxds, tx_pkts[npkts], repr_flag); + if (unlikely(ret != 0)) + goto xmit_end; + + rte_pktmbuf_free(*lmbuf); + + _mm_storel_epi64((void *)lmbuf, + _mm_loadu_si128((void *)&tx_pkts[npkts])); + npkts++; + free_descs -= NFDK_TX_DESC_PER_SIMPLE_PKT; + continue; + } + + ret = nfp_vec_avx2_nfdk_xmit_simple_send4(txq, ktxds, + &tx_pkts[npkts], repr_flag); + if (unlikely(ret != 0)) + goto xmit_end; + + rte_pktmbuf_free_bulk(lmbuf, NFDK_SIMPLE_BURST_DES_NUM); + + nfp_net_nfdk_vec_avx2_xmit_mbuf_store4(lmbuf, &tx_pkts[npkts]); + + npkts += 4; + free_descs -= NFDK_SIMPLE_BURST_DES_NUM; + } + +xmit_end: + /* Increment write pointers. Force memory write before we let HW know */ + rte_wmb(); + nfp_qcp_ptr_add(txq->qcp_q, NFP_QCP_WRITE_PTR, ((npkts << 1) + simple_close)); + + return npkts; +} + +static inline void +nfp_net_nfdk_vec_avx2_xmit_simple_close_block(struct nfp_net_txq *txq, + uint16_t *simple_close) +{ + uint16_t i; + uint16_t wr_p; + uint16_t nop_slots; + __m128i zero_128 = _mm_setzero_si128(); + __m256i zero_256 = _mm256_setzero_si256(); + + wr_p = txq->wr_p; + nop_slots = D_BLOCK_CPL(wr_p); + + for (i = nop_slots; i >= 4; i -= 4, wr_p += 4) { + _mm256_store_si256((void *)&txq->ktxds[wr_p], zero_256); + rte_pktmbuf_free_bulk(&txq->txbufs[wr_p].mbuf, 4); + _mm256_store_si256((void *)&txq->txbufs[wr_p], zero_256); + } + + for (; i >= 2; i -= 2, wr_p += 2) { + _mm_store_si128((void *)&txq->ktxds[wr_p], zero_128); + rte_pktmbuf_free_bulk(&txq->txbufs[wr_p].mbuf, 2); + _mm_store_si128((void *)&txq->txbufs[wr_p], zero_128); + } + + for (; i >= 1; i--, wr_p++) { + _mm_storel_epi64((void *)&txq->ktxds[wr_p], zero_128); + rte_pktmbuf_free(txq->txbufs[wr_p].mbuf); + _mm_storel_epi64((void *)&txq->txbufs[wr_p], zero_128); + } + + txq->data_pending = 0; + txq->wr_p = D_IDX(txq, txq->wr_p + nop_slots); + + (*simple_close) += nop_slots; +} + +static inline uint32_t +nfp_net_nfdk_vec_avx2_xmit_simple_prepare(struct nfp_net_txq *txq, + uint16_t *simple_close) +{ + uint16_t wr_p; + __m128i zero_128 = _mm_setzero_si128(); + + wr_p = txq->wr_p; + + _mm_storel_epi64((void *)&txq->ktxds[wr_p], zero_128); + rte_pktmbuf_free(txq->txbufs[wr_p].mbuf); + _mm_storel_epi64((void *)&txq->txbufs[wr_p], zero_128); + + txq->wr_p = D_IDX(txq, wr_p + 1); + (*simple_close)++; + + return txq->wr_p; +} + +static inline void +nfp_net_nfdk_vec_avx2_xmit_simple_check(struct nfp_net_txq *txq, + struct rte_mbuf *pkt, + bool *simple_flag, + bool *pending_flag, + uint16_t *data_pending, + uint32_t *wr_p, + uint16_t *simple_close) +{ + uint32_t data_pending_temp; + + /* Let the first descriptor index even before send simple packets */ + if (!(*simple_flag)) { + if ((*wr_p & 0x1) == 0x1) + *wr_p = nfp_net_nfdk_vec_avx2_xmit_simple_prepare(txq, simple_close); + + *simple_flag = true; + } + + /* Simple packets only need one close block operation */ + if (!(*pending_flag)) { + if ((*wr_p & (NFDK_TX_DESC_BLOCK_CNT - 1)) == 0) { + *pending_flag = true; + return; + } + + data_pending_temp = *data_pending + pkt->data_len; + if (data_pending_temp > NFDK_TX_MAX_DATA_PER_BLOCK) { + nfp_net_nfdk_vec_avx2_xmit_simple_close_block(txq, simple_close); + *pending_flag = true; + return; + } + + *data_pending = data_pending_temp; + + *wr_p += 2; + } +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_simple_count(struct nfp_net_txq *txq, + struct rte_mbuf **tx_pkts, + uint16_t head, + uint16_t nb_pkts, + uint16_t *simple_close) +{ + uint32_t wr_p; + uint16_t simple_idx; + struct rte_mbuf *pkt; + uint16_t data_pending; + bool simple_flag = false; + bool pending_flag = false; + uint16_t simple_count = 0; + + *simple_close = 0; + wr_p = txq->wr_p; + data_pending = txq->data_pending; + + for (simple_idx = head; simple_idx < nb_pkts; simple_idx++) { + pkt = tx_pkts[simple_idx]; + if (!nfp_net_nfdk_is_simple_packet(pkt, txq->hw)) + break; + + simple_count++; + if (!txq->simple_always) + nfp_net_nfdk_vec_avx2_xmit_simple_check(txq, pkt, &simple_flag, + &pending_flag, &data_pending, &wr_p, simple_close); + } + + return simple_count; +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_others_count(struct nfp_net_txq *txq, + struct rte_mbuf **tx_pkts, + uint16_t head, + uint16_t nb_pkts) +{ + uint16_t others_idx; + struct rte_mbuf *pkt; + uint16_t others_count = 0; + + for (others_idx = head; others_idx < nb_pkts; others_idx++) { + pkt = tx_pkts[others_idx]; + if (nfp_net_nfdk_is_simple_packet(pkt, txq->hw)) + break; + + others_count++; + } + + return others_count; +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_common(void *tx_queue, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + uint16_t i; + uint16_t avail = 0; + uint16_t simple_close; + uint16_t simple_count; + uint16_t simple_avail; + uint16_t others_count; + uint16_t others_avail; + struct nfp_net_txq *txq = tx_queue; + + for (i = 0; i < nb_pkts; i++) { + simple_count = nfp_net_nfdk_vec_avx2_xmit_simple_count(txq, tx_pkts, i, + nb_pkts, &simple_close); + if (simple_count > 0) { + if (!txq->simple_always) + txq->simple_always = true; + + simple_avail = nfp_net_nfdk_vec_avx2_xmit_simple_pkts(txq, + tx_pkts + i, simple_count, simple_close, + false); + + avail += simple_avail; + if (simple_avail != simple_count) + break; + + i += simple_count; + } + + if (i == nb_pkts) + break; + + others_count = nfp_net_nfdk_vec_avx2_xmit_others_count(txq, tx_pkts, + i, nb_pkts); + + if (txq->simple_always) + txq->simple_always = false; + + others_avail = nfp_net_nfdk_xmit_pkts_common(tx_queue, + tx_pkts + i, others_count, false); + + avail += others_avail; + if (others_avail != others_count) + break; + + i += others_count; + } + + return avail; +} + +uint16_t +nfp_net_nfdk_vec_avx2_xmit_pkts(void *tx_queue, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + return nfp_net_nfdk_vec_avx2_xmit_common(tx_queue, tx_pkts, nb_pkts); +} diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c b/drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c new file mode 100644 index 0000000000..146ec21d51 --- /dev/null +++ b/drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include "nfp_nfdk_vec.h" + +uint16_t __rte_weak +nfp_net_nfdk_vec_avx2_xmit_pkts(__rte_unused void *tx_queue, + __rte_unused struct rte_mbuf **tx_pkts, + __rte_unused uint16_t nb_pkts) +{ + return 0; +} diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c index 8c0cacd3fc..a7b40af712 100644 --- a/drivers/net/nfp/nfp_ethdev.c +++ b/drivers/net/nfp/nfp_ethdev.c @@ -28,6 +28,7 @@ #include "nfp_ipsec.h" #include "nfp_logs.h" #include "nfp_net_flow.h" +#include "nfp_rxtx_vec.h" /* 64-bit per app capabilities */ #define NFP_NET_APP_CAP_SP_INDIFF RTE_BIT64(0) /* Indifferent to port speed */ @@ -964,7 +965,7 @@ nfp_net_ethdev_ops_mount(struct nfp_net_hw *hw, if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3) eth_dev->tx_pkt_burst = nfp_net_nfd3_xmit_pkts; else - eth_dev->tx_pkt_burst = nfp_net_nfdk_xmit_pkts; + nfp_net_nfdk_xmit_pkts_set(eth_dev); eth_dev->dev_ops = &nfp_net_eth_dev_ops; eth_dev->rx_queue_count = nfp_net_rx_queue_count; diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c index e7c18fe90a..b955624ed6 100644 --- a/drivers/net/nfp/nfp_ethdev_vf.c +++ b/drivers/net/nfp/nfp_ethdev_vf.c @@ -14,6 +14,7 @@ #include "nfp_logs.h" #include "nfp_net_common.h" +#include "nfp_rxtx_vec.h" #define NFP_VF_DRIVER_NAME net_nfp_vf @@ -240,7 +241,7 @@ nfp_netvf_ethdev_ops_mount(struct nfp_net_hw *hw, if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3) eth_dev->tx_pkt_burst = nfp_net_nfd3_xmit_pkts; else - eth_dev->tx_pkt_burst = nfp_net_nfdk_xmit_pkts; + nfp_net_nfdk_xmit_pkts_set(eth_dev); eth_dev->dev_ops = &nfp_netvf_eth_dev_ops; eth_dev->rx_queue_count = nfp_net_rx_queue_count; diff --git a/drivers/net/nfp/nfp_rxtx.h b/drivers/net/nfp/nfp_rxtx.h index 9806384a63..3ddf717da0 100644 --- a/drivers/net/nfp/nfp_rxtx.h +++ b/drivers/net/nfp/nfp_rxtx.h @@ -69,9 +69,12 @@ struct __rte_aligned(64) nfp_net_txq { /** Used by NFDk only */ uint16_t data_pending; + /** Used by NFDk vector xmit only */ + bool simple_always; + /** * At this point 58 bytes have been used for all the fields in the - * TX critical path. We have room for 6 bytes and still all placed + * TX critical path. We have room for 5 bytes and still all placed * in a cache line. */ uint64_t dma; diff --git a/drivers/net/nfp/nfp_rxtx_vec.h b/drivers/net/nfp/nfp_rxtx_vec.h new file mode 100644 index 0000000000..c92660f963 --- /dev/null +++ b/drivers/net/nfp/nfp_rxtx_vec.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#ifndef __NFP_RXTX_VEC_AVX2_H__ +#define __NFP_RXTX_VEC_AVX2_H__ + +#include + +bool nfp_net_get_avx2_supported(void); + +#endif /* __NFP_RXTX_VEC_AVX2_H__ */ diff --git a/drivers/net/nfp/nfp_rxtx_vec_avx2.c b/drivers/net/nfp/nfp_rxtx_vec_avx2.c new file mode 100644 index 0000000000..50638e74ab --- /dev/null +++ b/drivers/net/nfp/nfp_rxtx_vec_avx2.c @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include + +#include +#include + +#include "nfp_rxtx_vec.h" + +bool +nfp_net_get_avx2_supported(void) +{ + if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) + return true; + + return false; +} diff --git a/drivers/net/nfp/nfp_rxtx_vec_stub.c b/drivers/net/nfp/nfp_rxtx_vec_stub.c new file mode 100644 index 0000000000..1bc55b67e0 --- /dev/null +++ b/drivers/net/nfp/nfp_rxtx_vec_stub.c @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include + +#include + +#include "nfp_rxtx_vec.h" + +bool __rte_weak +nfp_net_get_avx2_supported(void) +{ + return false; +} -- 2.39.1