From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A9646455C3; Mon, 8 Jul 2024 07:59:29 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E693740E1E; Mon, 8 Jul 2024 07:59:21 +0200 (CEST) Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2112.outbound.protection.outlook.com [40.107.244.112]) by mails.dpdk.org (Postfix) with ESMTP id B4A7640E01 for ; Mon, 8 Jul 2024 07:59:19 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TFD1gJJj+PFdTJvQoBahwzaGL4FhnvEQkEwgFxhfW7UkGd+P785tCb0+7R4O/mX/Eli8iduh3P4sJ2RPYLRroH6LsiBR2ymCSkrebT3tgyKBscb5qTkXotPgUspIXUm2hPICvljIdvFTUBWB9LLez0cBoWYkPxBgngSurnzhJPHBnwh7TaIazHzLTenhC4Lkb++67KHV8XrPylcE3MZOHX3eq/eYVpfOqOgzuuj7U4O9+JpMtCUZqxUHrn3s5q/OteZHjzmy32WaOqhH4HhlsnJZKbenS6dE7Ah4UBzb+/vPeiHqa4SC9+SLGqffxerygT34ICI0QnqztHJpW/Mp4w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=x6wdS5r6uMhgXP+2amovH1AU7ZVLizxEodc0zDuo49c=; b=ZTtUxgYok+iJPpIe+dH/OD4D2jadAAxM5mb05anppqWdAWqN+q6Pk1XXN2Mz579E+LVtHDSvrreuJ3SG7OqKln0dXDId0++yi6n+KtvVRqh1LW/GGzfkfODCXUW69yQosWGaQnN8354ZmSgRS2ODrw7Sxla18IohMzJPVr/v+1oMH8CsRVGhQjs46NAjqF+naXnnrEsY/anXzxynMHe9MStcH/9mYJAm+sUvkKAzL38YAmN31e5V2gPq1R+wfSU0exyqDsifeVsyEmJCST/VzlLQ8da/4u5cP2fbqG3djdhWU03JyuuYwNO4SfcujykJl/uZPMT57TK2+IcBYU3pcQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=corigine.com; dmarc=pass action=none header.from=corigine.com; dkim=pass header.d=corigine.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=corigine.onmicrosoft.com; s=selector2-corigine-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=x6wdS5r6uMhgXP+2amovH1AU7ZVLizxEodc0zDuo49c=; b=PdqCYGLLwm5SQ6ePLMgTqmzWsSjmh+I0/GeoMbP0wiOJT4g9hdtvgjFvhDBzf48ll4iP1faUdWLKKX2UG6CPI8TJ3YY6Ho6Wf0mAyx3AnnB8Y4EmmPV5N3yb6gEyuIfT0A07tn1fhmMZyZVhp0xBrgRLwFEpQHS3RKeJ6+EboP8= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=corigine.com; Received: from SJ0PR13MB5545.namprd13.prod.outlook.com (2603:10b6:a03:424::5) by SJ0PR13MB5499.namprd13.prod.outlook.com (2603:10b6:a03:425::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7741.35; Mon, 8 Jul 2024 05:59:17 +0000 Received: from SJ0PR13MB5545.namprd13.prod.outlook.com ([fe80::b900:5f05:766f:833]) by SJ0PR13MB5545.namprd13.prod.outlook.com ([fe80::b900:5f05:766f:833%4]) with mapi id 15.20.7741.033; Mon, 8 Jul 2024 05:59:17 +0000 From: Chaoyong He To: dev@dpdk.org Cc: oss-drivers@corigine.com, Long Wu , Chaoyong He Subject: [PATCH v2 2/4] net/nfp: support AVX2 Tx function Date: Mon, 8 Jul 2024 13:58:52 +0800 Message-Id: <20240708055854.107739-3-chaoyong.he@corigine.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20240708055854.107739-1-chaoyong.he@corigine.com> References: <20240619025914.3216054-1-chaoyong.he@corigine.com> <20240708055854.107739-1-chaoyong.he@corigine.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: SJ0PR05CA0168.namprd05.prod.outlook.com (2603:10b6:a03:339::23) To SJ0PR13MB5545.namprd13.prod.outlook.com (2603:10b6:a03:424::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ0PR13MB5545:EE_|SJ0PR13MB5499:EE_ X-MS-Office365-Filtering-Correlation-Id: 678298e6-d2fd-4b62-d8a9-08dc9f1319fb X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|366016|52116014|376014|1800799024|38350700014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?6DwwXcGreHSNR03CSPLn8GGVTrAxg2jsy/fNkpwCR8SETM/TqHWZTAD8xyLv?= =?us-ascii?Q?LKJvPdsFSUF4JgXLvZhPFVNpXpZwRhjwtaYxQwkQ7X6ym7OGuPi4Rdmv/IOS?= =?us-ascii?Q?9qyGoxvPtg51C+M0vQltI6HpPCWTWZtr2mmGC6GJvmHYgDSElOcuF31C9rXY?= =?us-ascii?Q?BRQgW8a2h2iF8Nprtx7tttfUbJPx6/kFa96vjsbbxh7TmKmhnL003y3IC888?= =?us-ascii?Q?Q9fI/5P94jDLyXqnZbL8SDcB8qixxvn4Z6HYDQqXq8ouMACuVFbxrWTZBf+6?= =?us-ascii?Q?DsUuk5XP+GQNbiwfSER3cdSNijbL/nyj2XKVaVu4H2GSiebQvO5Mt2DpioYc?= =?us-ascii?Q?9k5lgBWjCEusX/FG4Poir5nFBFxG2cvV/tRmDI8edkBUjlwYLUIS329ucND3?= =?us-ascii?Q?6PM9tcO/26d6Qmc+1BXXoMshJlIPPP7IlJ9YNp4GXYG0MXPCLJwjPKM3OO1Y?= =?us-ascii?Q?I6Tle1h5mtN/7/nerfZSTHJnZ1YiS6jhniaSiW+lurINQg7aaQwNpVD3u5Ck?= =?us-ascii?Q?hrWJP23Fw+j6au8NsyKUblb9I+JG/tULgeNG/orz7DH6Vu/aP/Nuy4oDvcBy?= =?us-ascii?Q?JH5Oqun+mgtbh9Q5cS1sCm5DzLhsWMypNsUNkFRgoyn8xQFGW7u77zysXRgO?= =?us-ascii?Q?U+zuyfbrT6DZ2H0tKGc6t3QN+agkY2gSEOYyE50fho/9WpK46WhPeHrQZTwm?= =?us-ascii?Q?KgqvbvDFlByLJOWtsKucQGJFmasp7LUxvy5bquvJm6vYa5FasJU2AZL1kB21?= =?us-ascii?Q?wTqSFHQBoLNmksbc5aleKT5sTEaP2HYPGDQj+eFYWVHu4AjjiC9bNKoiMshW?= =?us-ascii?Q?HR7hgG9EEObFoctNeTeb32DOl+REn6loMxsGxpEx+zw/KDzasazes9+ssU8P?= =?us-ascii?Q?jIAJtwcWao8GLRK8dRUHr+do08NsfnUj/CVv//Z+vyDsQIpJ7hfOra2fhAm/?= =?us-ascii?Q?mzbcu/8CYIX8HE8SstcI0Ikh1m1jjBTofodHSaNNUpmsa/s4lcABn8goF8f2?= =?us-ascii?Q?rKAUPWMsckvPulx9NsmT5/JxUNivWow1u2nbbMjzmWrSQZieBiGdU1CQWh9A?= =?us-ascii?Q?WP+AKo1HFVIwq3M5DC8eKMOy/dD8YMwSVX/nhkyD8PGwDyRiasPQrRISIFeg?= =?us-ascii?Q?M0ViVZtd7Ys8Shnr25AkPIrkwf2KpQUIZxEa0LTjCQxvkhhj27CxoZbo7akb?= =?us-ascii?Q?86e2CWMvRHOvDgyKUJtLOakvcKTqYHBksufJ+HzjJdUECYZJh+5ecTT/sWLQ?= =?us-ascii?Q?gqauTiY4iQDBoa2TRd2sQ4G070OLkBn2Cvo5venUDdJUnbMYdI5+dBjKQvwu?= =?us-ascii?Q?Xvz9Lh0koC7Vx5NkUGYOpizkkivz3SOlYqkUCn/66Weo2bFgQaxFzzWIgHHN?= =?us-ascii?Q?r2vwuZufhHwZFjTGyHt0xKgcfUwx?= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SJ0PR13MB5545.namprd13.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(366016)(52116014)(376014)(1800799024)(38350700014); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?7EsSZykilClBX3X8GTc7HFse1B5ik0nfluUR6XPx0PkTZJ1s72dw2be0fCBc?= =?us-ascii?Q?u0dMkUiNI+0SpUNNsT2GZ6EnytIrfPZNev03p+aBzKs/DZH1PkqvwbGirCWg?= =?us-ascii?Q?qQ/iqJiA/WtY4Hv9mXPGW9TNNj+8A0d/5ywzIGPeqjsgS0WoctNGc7XjAswl?= =?us-ascii?Q?UbiPT6c4MWbaFc+CdW7ubebb8Ih1ME9YwdlGBwdeYPSF1FcU46f+U78EQhWk?= =?us-ascii?Q?HWqqUo4tABY6qr+tVT3GQKSewAGsgJBOthen3sjneLnRMDTqNJ3RcHX41jRq?= =?us-ascii?Q?i6Gsy4iwrpRcPsezq2NJuTZG9QiP6fiGiPaWvnVi5Zo6Idmymc9/+brLQ2I6?= =?us-ascii?Q?SVNRXgeIgNUT/Dsw+BEQj4sOFalaD6s2+vWQINA2neejycblKVe3l4CFpfV4?= =?us-ascii?Q?PUN4i+T3/Ab8d1ZtfP/KJQJH10B4iIxPojEDi28o4v09B49pObepsAqcU0zS?= =?us-ascii?Q?htYFA7cZzGzti3JdtjVszmtzfBgRrkCYusLuC/H4QHZ8CwQje8vxjEGEo7Ge?= =?us-ascii?Q?Lvk7gPlrKeS3NzgVE02RbZCHDWieEZkLgWhUT4mLWqcv0+xJagVcy5+A275p?= =?us-ascii?Q?8zR9aRY50u7n4OuMFPhhvL9Qa6wddhlWjAQs6xQ6Qi16QpETt1A5V5RK1YrC?= =?us-ascii?Q?7sQbg3kyZ2CLcylwrbzAziSbMnUcglE/Oa8azeQhRfQmMLoC/ndCrDLOzTtV?= =?us-ascii?Q?asEGT0o/Ks7op/nJMEazujTKXP6vUxTLFPbAaDT+t6JRPSi2+rZhoxDPWwAG?= =?us-ascii?Q?hc070TCfRKc2x55fBshBZISu8pD5UJes7oyeVm5S1ORoTYFiVXzQYEUwEqdo?= =?us-ascii?Q?QMr82OIBJMXBPHEXAUhDQK74eZCfQkmuJMpOXPOECwhp4HD8esBX6mD5qp5Q?= =?us-ascii?Q?kF5PM3fa9COxmMb3f/kgHsUMd6+0CTR1OnIHdHGOE84+prqOf4Jema7j6wxa?= =?us-ascii?Q?NUpYtLjsNFvotS8z+aLKR2Fkmx4oghqfOTMUDjRNk/7S1xd6geJL3eCpdRGp?= =?us-ascii?Q?RazJhg1rKfsPsZsuFPIM4k+UmoUfxWKv28Wv/uEoJP+4irZAp2K/2YI3OQs1?= =?us-ascii?Q?7ASkLd3sPxWKxIZgd+bzclkYut+RMhymP5AnRbVRgVQrefxMLbrKNgL8PIX1?= =?us-ascii?Q?CROWhqwFRGqVZiTa1u5P2sxjBQjwZIe3ktZAVJPZi0GMbZh630oPxjq9ZSu1?= =?us-ascii?Q?nUnq7B8OUpXnlFLGL+CY85m2HJcd/uKSeiuD8qWyQtLJqbgzo7ZQVNzWrvwB?= =?us-ascii?Q?w+6v39mALQJuhnKw4nn7HjtOn46r2lAXHVYrWAyoAcZHdsFtfktOEhVKoHLj?= =?us-ascii?Q?hM+kQCs41GH0JjIm1lg83uv0KUejEpnV2xp7HqZdCljlq1ZvALX1CbxLiONz?= =?us-ascii?Q?wH1J7D11n3oslI/OuG+5aM32cL+RWIKeD9Adlva2sraSgsL0uoFTOH+yG/9K?= =?us-ascii?Q?JcUFRx3o2zZQpk6oSl58yUjIPwbdddPjm1aZfT1SvsQk0oXNOsR9lY9A66jK?= =?us-ascii?Q?N8KoBq5jwrRpkv7R1zWJLVCPKTeBxRdlNPtls5IsuyqOyVUoJUNKnYcR1HdF?= =?us-ascii?Q?om5DxzFK2LXhuUgbvYbxKJFJE2IUQJrQKCtxGhVIXe3iFBBMktaqVksS0VPX?= =?us-ascii?Q?sA=3D=3D?= X-OriginatorOrg: corigine.com X-MS-Exchange-CrossTenant-Network-Message-Id: 678298e6-d2fd-4b62-d8a9-08dc9f1319fb X-MS-Exchange-CrossTenant-AuthSource: SJ0PR13MB5545.namprd13.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Jul 2024 05:59:17.2377 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: fe128f2c-073b-4c20-818e-7246a585940c X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: IT4N3FUAB5A8H50MzsUcuFgWiw1ZsNYXCf0QLvJYZ2pz5/pUq8t9XxnKPkhsCsI3zhbc8bKCI3SVKo6gua8dnKDav0l44ltW4wR37O7g6Hs= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR13MB5499 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Long Wu Use AVX2 instructions to accelerate Tx performance. The acceleration only works on X86 machine. Signed-off-by: Long Wu Reviewed-by: Chaoyong He --- drivers/net/nfp/meson.build | 15 + drivers/net/nfp/nfdk/nfp_nfdk.h | 1 + drivers/net/nfp/nfdk/nfp_nfdk_dp.c | 12 + drivers/net/nfp/nfdk/nfp_nfdk_vec.h | 36 ++ drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c | 432 ++++++++++++++++++++ drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c | 14 + drivers/net/nfp/nfp_ethdev.c | 3 +- drivers/net/nfp/nfp_ethdev_vf.c | 3 +- drivers/net/nfp/nfp_rxtx.h | 5 +- drivers/net/nfp/nfp_rxtx_vec.h | 13 + drivers/net/nfp/nfp_rxtx_vec_avx2.c | 21 + drivers/net/nfp/nfp_rxtx_vec_stub.c | 16 + 12 files changed, 568 insertions(+), 3 deletions(-) create mode 100644 drivers/net/nfp/nfdk/nfp_nfdk_vec.h create mode 100644 drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c create mode 100644 drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c create mode 100644 drivers/net/nfp/nfp_rxtx_vec.h create mode 100644 drivers/net/nfp/nfp_rxtx_vec_avx2.c create mode 100644 drivers/net/nfp/nfp_rxtx_vec_stub.c diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build index b7e5beffb0..39bda04bc5 100644 --- a/drivers/net/nfp/meson.build +++ b/drivers/net/nfp/meson.build @@ -16,6 +16,7 @@ sources = files( 'flower/nfp_flower_service.c', 'nfd3/nfp_nfd3_dp.c', 'nfdk/nfp_nfdk_dp.c', + 'nfdk/nfp_nfdk_vec_stub.c', 'nfpcore/nfp_cppcore.c', 'nfpcore/nfp_crc.c', 'nfpcore/nfp_elf.c', @@ -43,8 +44,22 @@ sources = files( 'nfp_net_flow.c', 'nfp_net_meta.c', 'nfp_rxtx.c', + 'nfp_rxtx_vec_stub.c', 'nfp_service.c', 'nfp_trace.c', ) +if arch_subdir == 'x86' + avx2_sources = files( + 'nfdk/nfp_nfdk_vec_avx2_dp.c', + 'nfp_rxtx_vec_avx2.c', + ) + nfp_avx2_lib = static_library('nfp_avx2_lib', + avx2_sources, + dependencies: [static_rte_ethdev, static_rte_bus_pci, + static_rte_common_nfp], + c_args: [cflags, '-mavx2']) + objs += nfp_avx2_lib.extract_all_objects(recursive: true) +endif + deps += ['hash', 'security', 'common_nfp'] diff --git a/drivers/net/nfp/nfdk/nfp_nfdk.h b/drivers/net/nfp/nfdk/nfp_nfdk.h index 89a98d13f3..29d862f6f0 100644 --- a/drivers/net/nfp/nfdk/nfp_nfdk.h +++ b/drivers/net/nfp/nfdk/nfp_nfdk.h @@ -222,5 +222,6 @@ int nfp_net_nfdk_tx_maybe_close_block(struct nfp_net_txq *txq, int nfp_net_nfdk_set_meta_data(struct rte_mbuf *pkt, struct nfp_net_txq *txq, uint64_t *metadata); +void nfp_net_nfdk_xmit_pkts_set(struct rte_eth_dev *eth_dev); #endif /* __NFP_NFDK_H__ */ diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c index 173aabf0b9..2cea5688b3 100644 --- a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c +++ b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c @@ -11,6 +11,8 @@ #include "../flower/nfp_flower.h" #include "../nfp_logs.h" #include "../nfp_net_meta.h" +#include "../nfp_rxtx_vec.h" +#include "nfp_nfdk_vec.h" #define NFDK_TX_DESC_GATHER_MAX 17 @@ -511,6 +513,7 @@ nfp_net_nfdk_tx_queue_setup(struct rte_eth_dev *dev, dev->data->tx_queues[queue_idx] = txq; txq->hw = hw; txq->hw_priv = dev->process_private; + txq->simple_always = true; /* * Telling the HW about the physical address of the TX ring and number @@ -521,3 +524,12 @@ nfp_net_nfdk_tx_queue_setup(struct rte_eth_dev *dev, return 0; } + +void +nfp_net_nfdk_xmit_pkts_set(struct rte_eth_dev *eth_dev) +{ + if (nfp_net_get_avx2_supported()) + eth_dev->tx_pkt_burst = nfp_net_nfdk_vec_avx2_xmit_pkts; + else + eth_dev->tx_pkt_burst = nfp_net_nfdk_xmit_pkts; +} diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_vec.h b/drivers/net/nfp/nfdk/nfp_nfdk_vec.h new file mode 100644 index 0000000000..14319d6cf6 --- /dev/null +++ b/drivers/net/nfp/nfdk/nfp_nfdk_vec.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#ifndef __NFP_NFDK_VEC_H__ +#define __NFP_NFDK_VEC_H__ + +#include + +#include + +#include "../nfp_net_common.h" +#include "nfp_nfdk.h" + +static inline bool +nfp_net_nfdk_is_simple_packet(struct rte_mbuf *pkt, + struct nfp_net_hw *hw) +{ + if (pkt->data_len > NFDK_TX_MAX_DATA_PER_HEAD) + return false; + + if ((hw->super.cap & NFP_NET_CFG_CTRL_LSO_ANY) == 0) + return true; + + if ((pkt->ol_flags & RTE_MBUF_F_TX_TCP_SEG) == 0) + return true; + + return false; +} + +uint16_t nfp_net_nfdk_vec_avx2_xmit_pkts(void *tx_queue, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts); + +#endif /* __NFP_NFDK_VEC_H__ */ diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c b/drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c new file mode 100644 index 0000000000..6d1359fdb1 --- /dev/null +++ b/drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c @@ -0,0 +1,432 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include +#include +#include + +#include "../nfp_logs.h" +#include "nfp_nfdk.h" +#include "nfp_nfdk_vec.h" + +/* + * One simple packet needs 2 descriptors so if send 4 packets driver will use + * 8 descriptors at once. + */ +#define NFDK_SIMPLE_BURST_DES_NUM 8 + +#define NFDK_SIMPLE_DES_TYPE (NFDK_DESC_TX_EOP | \ + (NFDK_DESC_TX_TYPE_HEAD & (NFDK_DESC_TX_TYPE_SIMPLE << 12))) + +static inline int +nfp_net_nfdk_vec_avx2_xmit_simple_set_des2(struct rte_mbuf *pkt, + struct nfp_net_txq *txq, + uint64_t *des_addr, + uint64_t *des_meta, + bool repr_flag) +{ + int ret; + __m128i dma_addr; + __m128i dma_hi; + __m128i data_off; + __m128i dlen_type; + uint64_t metadata; + + if (repr_flag) { + metadata = NFDK_DESC_TX_CHAIN_META; + } else { + ret = nfp_net_nfdk_set_meta_data(pkt, txq, &metadata); + if (unlikely(ret != 0)) + return ret; + } + + data_off = _mm_set_epi64x(0, pkt->data_off); + dma_addr = _mm_add_epi64(_mm_loadu_si128((__m128i *)&pkt->buf_addr), data_off); + dma_hi = _mm_srli_epi64(dma_addr, 32); + + dlen_type = _mm_set_epi64x(0, (pkt->data_len - 1) | NFDK_SIMPLE_DES_TYPE); + + *des_addr = _mm_extract_epi64(_mm_add_epi64(_mm_unpacklo_epi32(dma_hi, dma_addr), + _mm_slli_epi64(dlen_type, 16)), 0); + + *des_meta = nfp_net_nfdk_tx_cksum(txq, pkt, metadata); + + return 0; +} + +static inline int +nfp_net_nfdk_vec_avx2_xmit_simple_send1(struct nfp_net_txq *txq, + struct nfp_net_nfdk_tx_desc *txds, + struct rte_mbuf *pkt, + bool repr_flag) +{ + int ret; + __m128i des_data; + uint64_t des_addr; + uint64_t des_meta; + + ret = nfp_net_nfdk_vec_avx2_xmit_simple_set_des2(pkt, txq, &des_addr, + &des_meta, repr_flag); + if (unlikely(ret != 0)) + return ret; + + txq->wr_p = D_IDX(txq, txq->wr_p + NFDK_TX_DESC_PER_SIMPLE_PKT); + if ((txq->wr_p & (NFDK_TX_DESC_BLOCK_CNT - 1)) != 0) + txq->data_pending += pkt->data_len; + else + txq->data_pending = 0; + + des_data = _mm_set_epi64x(des_meta, des_addr); + + _mm_store_si128((void *)txds, des_data); + + return 0; +} + +static inline int +nfp_vec_avx2_nfdk_xmit_simple_send4(struct nfp_net_txq *txq, + struct nfp_net_nfdk_tx_desc *txds, + struct rte_mbuf **pkt, + bool repr_flag) +{ + int ret; + uint16_t i; + __m256i des_data0_1; + __m256i des_data2_3; + uint64_t des_addr[4]; + uint64_t des_meta[4]; + + for (i = 0; i < 4; i++) { + ret = nfp_net_nfdk_vec_avx2_xmit_simple_set_des2(pkt[i], txq, + &des_addr[i], &des_meta[i], repr_flag); + if (unlikely(ret != 0)) + return ret; + } + + for (i = 0; i < 4; i++) { + txq->wr_p = D_IDX(txq, txq->wr_p + NFDK_TX_DESC_PER_SIMPLE_PKT); + if ((txq->wr_p & (NFDK_TX_DESC_BLOCK_CNT - 1)) != 0) + txq->data_pending += pkt[i]->data_len; + else + txq->data_pending = 0; + } + + des_data0_1 = _mm256_set_epi64x(des_meta[1], des_addr[1], des_meta[0], des_addr[0]); + des_data2_3 = _mm256_set_epi64x(des_meta[3], des_addr[3], des_meta[2], des_addr[2]); + + _mm256_store_si256((void *)txds, des_data0_1); + _mm256_store_si256((void *)(txds + 4), des_data2_3); + + return 0; +} + +static inline void +nfp_net_nfdk_vec_avx2_xmit_mbuf_store4(struct rte_mbuf **mbuf, + struct rte_mbuf **tx_pkts) +{ + __m256i mbuf_room0_1; + __m256i mbuf_room2_3; + + mbuf_room0_1 = _mm256_set_epi64x(0, (uintptr_t)tx_pkts[1], 0, + (uintptr_t)tx_pkts[0]); + mbuf_room2_3 = _mm256_set_epi64x(0, (uintptr_t)tx_pkts[3], 0, + (uintptr_t)tx_pkts[2]); + + _mm256_store_si256((void *)mbuf, mbuf_room0_1); + _mm256_store_si256((void *)(mbuf + 4), mbuf_room2_3); +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_simple_pkts(struct nfp_net_txq *txq, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts, + uint16_t simple_close, + bool repr_flag) +{ + int ret; + uint16_t npkts = 0; + uint16_t need_txds; + uint16_t free_descs; + struct rte_mbuf **lmbuf; + struct nfp_net_nfdk_tx_desc *ktxds; + + PMD_TX_LOG(DEBUG, "Working for queue %hu at pos %u and %hu packets", + txq->qidx, txq->wr_p, nb_pkts); + + need_txds = nb_pkts << 1; + if (nfp_net_nfdk_free_tx_desc(txq) < need_txds || nfp_net_nfdk_txq_full(txq)) + nfp_net_tx_free_bufs(txq); + + free_descs = nfp_net_nfdk_free_tx_desc(txq); + if (unlikely(free_descs < NFDK_TX_DESC_PER_SIMPLE_PKT)) { + if (unlikely(simple_close > 0)) + goto xmit_end; + + return 0; + } + + PMD_TX_LOG(DEBUG, "Queue: %hu. Sending %hu packets", txq->qidx, nb_pkts); + + /* Sending packets */ + while (npkts < nb_pkts && free_descs >= NFDK_TX_DESC_PER_SIMPLE_PKT) { + ktxds = &txq->ktxds[txq->wr_p]; + lmbuf = &txq->txbufs[txq->wr_p].mbuf; + + /* + * If can not send burst, just send one. + * 1. Tx ring will come to the tail. + * 2. Do not need to send 4 packets. + * 3. If pointer address unaligned on 32-bit boundary. + * 4. If free descriptors are not enough. + */ + if ((txq->tx_count - txq->wr_p) < NFDK_SIMPLE_BURST_DES_NUM || + (nb_pkts - npkts) < 4 || + ((uintptr_t)ktxds & 0x1F) != 0 || + free_descs < NFDK_SIMPLE_BURST_DES_NUM) { + ret = nfp_net_nfdk_vec_avx2_xmit_simple_send1(txq, + ktxds, tx_pkts[npkts], repr_flag); + if (unlikely(ret != 0)) + goto xmit_end; + + rte_pktmbuf_free(*lmbuf); + + _mm_storel_epi64((void *)lmbuf, + _mm_loadu_si128((void *)&tx_pkts[npkts])); + npkts++; + free_descs -= NFDK_TX_DESC_PER_SIMPLE_PKT; + continue; + } + + ret = nfp_vec_avx2_nfdk_xmit_simple_send4(txq, ktxds, + &tx_pkts[npkts], repr_flag); + if (unlikely(ret != 0)) + goto xmit_end; + + rte_pktmbuf_free_bulk(lmbuf, NFDK_SIMPLE_BURST_DES_NUM); + + nfp_net_nfdk_vec_avx2_xmit_mbuf_store4(lmbuf, &tx_pkts[npkts]); + + npkts += 4; + free_descs -= NFDK_SIMPLE_BURST_DES_NUM; + } + +xmit_end: + /* Increment write pointers. Force memory write before we let HW know */ + rte_wmb(); + nfp_qcp_ptr_add(txq->qcp_q, NFP_QCP_WRITE_PTR, ((npkts << 1) + simple_close)); + + return npkts; +} + +static inline void +nfp_net_nfdk_vec_avx2_xmit_simple_close_block(struct nfp_net_txq *txq, + uint16_t *simple_close) +{ + uint16_t i; + uint16_t wr_p; + uint16_t nop_slots; + __m128i zero_128 = _mm_setzero_si128(); + __m256i zero_256 = _mm256_setzero_si256(); + + wr_p = txq->wr_p; + nop_slots = D_BLOCK_CPL(wr_p); + + for (i = nop_slots; i >= 4; i -= 4, wr_p += 4) { + _mm256_store_si256((void *)&txq->ktxds[wr_p], zero_256); + rte_pktmbuf_free_bulk(&txq->txbufs[wr_p].mbuf, 4); + _mm256_store_si256((void *)&txq->txbufs[wr_p], zero_256); + } + + for (; i >= 2; i -= 2, wr_p += 2) { + _mm_store_si128((void *)&txq->ktxds[wr_p], zero_128); + rte_pktmbuf_free_bulk(&txq->txbufs[wr_p].mbuf, 2); + _mm_store_si128((void *)&txq->txbufs[wr_p], zero_128); + } + + for (; i >= 1; i--, wr_p++) { + _mm_storel_epi64((void *)&txq->ktxds[wr_p], zero_128); + rte_pktmbuf_free(txq->txbufs[wr_p].mbuf); + _mm_storel_epi64((void *)&txq->txbufs[wr_p], zero_128); + } + + txq->data_pending = 0; + txq->wr_p = D_IDX(txq, txq->wr_p + nop_slots); + + (*simple_close) += nop_slots; +} + +static inline uint32_t +nfp_net_nfdk_vec_avx2_xmit_simple_prepare(struct nfp_net_txq *txq, + uint16_t *simple_close) +{ + uint16_t wr_p; + __m128i zero_128 = _mm_setzero_si128(); + + wr_p = txq->wr_p; + + _mm_storel_epi64((void *)&txq->ktxds[wr_p], zero_128); + rte_pktmbuf_free(txq->txbufs[wr_p].mbuf); + _mm_storel_epi64((void *)&txq->txbufs[wr_p], zero_128); + + txq->wr_p = D_IDX(txq, wr_p + 1); + (*simple_close)++; + + return txq->wr_p; +} + +static inline void +nfp_net_nfdk_vec_avx2_xmit_simple_check(struct nfp_net_txq *txq, + struct rte_mbuf *pkt, + bool *simple_flag, + bool *pending_flag, + uint16_t *data_pending, + uint32_t *wr_p, + uint16_t *simple_close) +{ + uint32_t data_pending_temp; + + /* Let the first descriptor index even before send simple packets */ + if (!(*simple_flag)) { + if ((*wr_p & 0x1) == 0x1) + *wr_p = nfp_net_nfdk_vec_avx2_xmit_simple_prepare(txq, simple_close); + + *simple_flag = true; + } + + /* Simple packets only need one close block operation */ + if (!(*pending_flag)) { + if ((*wr_p & (NFDK_TX_DESC_BLOCK_CNT - 1)) == 0) { + *pending_flag = true; + return; + } + + data_pending_temp = *data_pending + pkt->data_len; + if (data_pending_temp > NFDK_TX_MAX_DATA_PER_BLOCK) { + nfp_net_nfdk_vec_avx2_xmit_simple_close_block(txq, simple_close); + *pending_flag = true; + return; + } + + *data_pending = data_pending_temp; + + *wr_p += 2; + } +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_simple_count(struct nfp_net_txq *txq, + struct rte_mbuf **tx_pkts, + uint16_t head, + uint16_t nb_pkts, + uint16_t *simple_close) +{ + uint32_t wr_p; + uint16_t simple_idx; + struct rte_mbuf *pkt; + uint16_t data_pending; + bool simple_flag = false; + bool pending_flag = false; + uint16_t simple_count = 0; + + *simple_close = 0; + wr_p = txq->wr_p; + data_pending = txq->data_pending; + + for (simple_idx = head; simple_idx < nb_pkts; simple_idx++) { + pkt = tx_pkts[simple_idx]; + if (!nfp_net_nfdk_is_simple_packet(pkt, txq->hw)) + break; + + simple_count++; + if (!txq->simple_always) + nfp_net_nfdk_vec_avx2_xmit_simple_check(txq, pkt, &simple_flag, + &pending_flag, &data_pending, &wr_p, simple_close); + } + + return simple_count; +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_others_count(struct nfp_net_txq *txq, + struct rte_mbuf **tx_pkts, + uint16_t head, + uint16_t nb_pkts) +{ + uint16_t others_idx; + struct rte_mbuf *pkt; + uint16_t others_count = 0; + + for (others_idx = head; others_idx < nb_pkts; others_idx++) { + pkt = tx_pkts[others_idx]; + if (nfp_net_nfdk_is_simple_packet(pkt, txq->hw)) + break; + + others_count++; + } + + return others_count; +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_common(void *tx_queue, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + uint16_t i; + uint16_t avail = 0; + uint16_t simple_close; + uint16_t simple_count; + uint16_t simple_avail; + uint16_t others_count; + uint16_t others_avail; + struct nfp_net_txq *txq = tx_queue; + + for (i = 0; i < nb_pkts; i++) { + simple_count = nfp_net_nfdk_vec_avx2_xmit_simple_count(txq, tx_pkts, i, + nb_pkts, &simple_close); + if (simple_count > 0) { + if (!txq->simple_always) + txq->simple_always = true; + + simple_avail = nfp_net_nfdk_vec_avx2_xmit_simple_pkts(txq, + tx_pkts + i, simple_count, simple_close, + false); + + avail += simple_avail; + if (simple_avail != simple_count) + break; + + i += simple_count; + } + + if (i == nb_pkts) + break; + + others_count = nfp_net_nfdk_vec_avx2_xmit_others_count(txq, tx_pkts, + i, nb_pkts); + + if (txq->simple_always) + txq->simple_always = false; + + others_avail = nfp_net_nfdk_xmit_pkts_common(tx_queue, + tx_pkts + i, others_count, false); + + avail += others_avail; + if (others_avail != others_count) + break; + + i += others_count; + } + + return avail; +} + +uint16_t +nfp_net_nfdk_vec_avx2_xmit_pkts(void *tx_queue, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + return nfp_net_nfdk_vec_avx2_xmit_common(tx_queue, tx_pkts, nb_pkts); +} diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c b/drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c new file mode 100644 index 0000000000..146ec21d51 --- /dev/null +++ b/drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include "nfp_nfdk_vec.h" + +uint16_t __rte_weak +nfp_net_nfdk_vec_avx2_xmit_pkts(__rte_unused void *tx_queue, + __rte_unused struct rte_mbuf **tx_pkts, + __rte_unused uint16_t nb_pkts) +{ + return 0; +} diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c index 8c0cacd3fc..a7b40af712 100644 --- a/drivers/net/nfp/nfp_ethdev.c +++ b/drivers/net/nfp/nfp_ethdev.c @@ -28,6 +28,7 @@ #include "nfp_ipsec.h" #include "nfp_logs.h" #include "nfp_net_flow.h" +#include "nfp_rxtx_vec.h" /* 64-bit per app capabilities */ #define NFP_NET_APP_CAP_SP_INDIFF RTE_BIT64(0) /* Indifferent to port speed */ @@ -964,7 +965,7 @@ nfp_net_ethdev_ops_mount(struct nfp_net_hw *hw, if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3) eth_dev->tx_pkt_burst = nfp_net_nfd3_xmit_pkts; else - eth_dev->tx_pkt_burst = nfp_net_nfdk_xmit_pkts; + nfp_net_nfdk_xmit_pkts_set(eth_dev); eth_dev->dev_ops = &nfp_net_eth_dev_ops; eth_dev->rx_queue_count = nfp_net_rx_queue_count; diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c index e7c18fe90a..b955624ed6 100644 --- a/drivers/net/nfp/nfp_ethdev_vf.c +++ b/drivers/net/nfp/nfp_ethdev_vf.c @@ -14,6 +14,7 @@ #include "nfp_logs.h" #include "nfp_net_common.h" +#include "nfp_rxtx_vec.h" #define NFP_VF_DRIVER_NAME net_nfp_vf @@ -240,7 +241,7 @@ nfp_netvf_ethdev_ops_mount(struct nfp_net_hw *hw, if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3) eth_dev->tx_pkt_burst = nfp_net_nfd3_xmit_pkts; else - eth_dev->tx_pkt_burst = nfp_net_nfdk_xmit_pkts; + nfp_net_nfdk_xmit_pkts_set(eth_dev); eth_dev->dev_ops = &nfp_netvf_eth_dev_ops; eth_dev->rx_queue_count = nfp_net_rx_queue_count; diff --git a/drivers/net/nfp/nfp_rxtx.h b/drivers/net/nfp/nfp_rxtx.h index 9806384a63..3ddf717da0 100644 --- a/drivers/net/nfp/nfp_rxtx.h +++ b/drivers/net/nfp/nfp_rxtx.h @@ -69,9 +69,12 @@ struct __rte_aligned(64) nfp_net_txq { /** Used by NFDk only */ uint16_t data_pending; + /** Used by NFDk vector xmit only */ + bool simple_always; + /** * At this point 58 bytes have been used for all the fields in the - * TX critical path. We have room for 6 bytes and still all placed + * TX critical path. We have room for 5 bytes and still all placed * in a cache line. */ uint64_t dma; diff --git a/drivers/net/nfp/nfp_rxtx_vec.h b/drivers/net/nfp/nfp_rxtx_vec.h new file mode 100644 index 0000000000..c92660f963 --- /dev/null +++ b/drivers/net/nfp/nfp_rxtx_vec.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#ifndef __NFP_RXTX_VEC_AVX2_H__ +#define __NFP_RXTX_VEC_AVX2_H__ + +#include + +bool nfp_net_get_avx2_supported(void); + +#endif /* __NFP_RXTX_VEC_AVX2_H__ */ diff --git a/drivers/net/nfp/nfp_rxtx_vec_avx2.c b/drivers/net/nfp/nfp_rxtx_vec_avx2.c new file mode 100644 index 0000000000..50638e74ab --- /dev/null +++ b/drivers/net/nfp/nfp_rxtx_vec_avx2.c @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include + +#include +#include + +#include "nfp_rxtx_vec.h" + +bool +nfp_net_get_avx2_supported(void) +{ + if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) + return true; + + return false; +} diff --git a/drivers/net/nfp/nfp_rxtx_vec_stub.c b/drivers/net/nfp/nfp_rxtx_vec_stub.c new file mode 100644 index 0000000000..1bc55b67e0 --- /dev/null +++ b/drivers/net/nfp/nfp_rxtx_vec_stub.c @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include + +#include + +#include "nfp_rxtx_vec.h" + +bool __rte_weak +nfp_net_get_avx2_supported(void) +{ + return false; +} -- 2.39.1