From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2FB47454A2; Wed, 19 Jun 2024 04:59:48 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 825EA410E7; Wed, 19 Jun 2024 04:59:38 +0200 (CEST) Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2095.outbound.protection.outlook.com [40.107.92.95]) by mails.dpdk.org (Postfix) with ESMTP id 09CE741149 for ; Wed, 19 Jun 2024 04:59:37 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=L65r0M0uiTMvpjwToH1BJTMDhAbWqS2Vta4G4AscTzB0YKt1u+A8XeL+tRqMPbNfjeXr7dwsD3/HeCLqEzzQfDw4pD0PjfBhrwQ6/X85PIaI9mbLXdg+p16b4S8j7C9Lw6BHyRBNjmWNcHyXHaqv0PPgrffQ8BypIpAd2VwNZ5s9CD4KBZa7161K5zxeiMWtirccJxCvbQ1ApPn5CXxgiXCp/qMNkBFBFSkJUS6NslTTwzAFlmhm7SPdx56mnWnT16PklQAUv6EzfANAWz5XobJ7Zs5hLKBFnrCFdrBgbFzFc8VT30UlVOL+vnfUgxJ+cNQEj4qViZBr7uie0KQs3Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DukMFl5u1XwhOsqYug23GAWQREfxkrrpNSXakfQWa2I=; b=cc6aKOAd9Mp4ioUB/ci5WpELj8AMhaIdd68DBF2ZlSxQE3YCDYwH/O27rkejs7X8BTqYLcIcVGxCnk0MUc/8CTYbRvj6gGQp2dPhhurvnZiBUL+3FOFhrIoxZA2WcOqBUAJDLTRJdxyTHD/wzku+fmwVeohZTwiVpLD9mb/NO0Skd2KqHiTISc/5vWQmXlZf9kO8VPB6gcMe8cei0f+d3J+nGZqORFHisDHLxNjDZGfwsQnWP8+liA7ZY/4ZHejEyDViDfN7cjBts5nZnprjxN8FwBYisP4M1QUh1j0Hn8+zdg8xkCtf8EVeZX092FrgP6Vg/6bDoUh6KfESN8EO0g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=corigine.com; dmarc=pass action=none header.from=corigine.com; dkim=pass header.d=corigine.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=corigine.onmicrosoft.com; s=selector2-corigine-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=DukMFl5u1XwhOsqYug23GAWQREfxkrrpNSXakfQWa2I=; b=WfJX8ol8PwxMsEI8EJcXJop55KKl1Dhb7htms78x+H0su/4mPu9Eio4vVtg6hvBtLIxdHz1kMcoNRgvOyjcBjXRwZxRw3Lfnv4n8zFe4IJsrSOCXlZICkJyi/BgYLAerlalKdx5OTUb+XG9zGgoUJsbAHsTgdPfPE50aM0z16Oo= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=corigine.com; Received: from SJ0PR13MB5545.namprd13.prod.outlook.com (2603:10b6:a03:424::5) by SN4PR13MB5775.namprd13.prod.outlook.com (2603:10b6:806:217::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7698.19; Wed, 19 Jun 2024 02:59:34 +0000 Received: from SJ0PR13MB5545.namprd13.prod.outlook.com ([fe80::b900:5f05:766f:833]) by SJ0PR13MB5545.namprd13.prod.outlook.com ([fe80::b900:5f05:766f:833%4]) with mapi id 15.20.7677.030; Wed, 19 Jun 2024 02:59:34 +0000 From: Chaoyong He To: dev@dpdk.org Cc: oss-drivers@corigine.com, Long Wu , Chaoyong He Subject: [PATCH 2/4] net/nfp: support AVX2 Tx function Date: Wed, 19 Jun 2024 10:59:12 +0800 Message-Id: <20240619025914.3216054-3-chaoyong.he@corigine.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20240619025914.3216054-1-chaoyong.he@corigine.com> References: <20240619025914.3216054-1-chaoyong.he@corigine.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: SJ0PR13CA0189.namprd13.prod.outlook.com (2603:10b6:a03:2c3::14) To SJ0PR13MB5545.namprd13.prod.outlook.com (2603:10b6:a03:424::5) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ0PR13MB5545:EE_|SN4PR13MB5775:EE_ X-MS-Office365-Filtering-Correlation-Id: c6e8772f-e5c0-4c5e-fc58-08dc900bd958 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230037|376011|52116011|366013|1800799021|38350700011; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?M+h7gpO8utdpxZEo2LSQUs5Ilr6s3ZsPEh560bk0T3VrMyX4nX7wXzEnDVDd?= =?us-ascii?Q?JYbR0xbFt7SGVB6+0Ln6WuTdfGnQHcfC5LJ/RevXcpNnJFmz/7xKAbzgZ3WF?= =?us-ascii?Q?XdflzqGCCB/MrsMrkoZnLepD6bq1sxQJR/lAlDjZOyd622E0OMgIwMXdrPiR?= =?us-ascii?Q?GKcEKMkd0NsL9SPTTSm3mqQ2BJS+8bS4Ga7c3l8k7DZc5C+2i0M1wvEJq4ay?= =?us-ascii?Q?ar1bxNuItm9DqaEpDJrk+i47cziS/qvK85THIA07elYUy+/fQSRMSlkal7Ff?= =?us-ascii?Q?8h4VnaRK2caoC8lsOjj35c8RM7zS5/qE+znq+Sio9gHtOx48+rktHPIdpGt0?= =?us-ascii?Q?OE2Tea1eIrCjF+ORlvsQu/UpHtRrqbnOiMlImcdABchQUsQ3YqifrTEHQV4G?= =?us-ascii?Q?WbxBgmCuII4uLyP5wVO1WlBk+hn8FzROBy4JbSlOs+zz9W+T35jJAsRD5DR3?= =?us-ascii?Q?FbnborcCYdxrgslqk6gIHHi6ABoWDyf8JMQP4JoU7yQa86xmXZtANiJNgiJO?= =?us-ascii?Q?ZEFAflbU+f2UX/KJGRAwsCVyzslT2wDsF1hqLUaPirea6qeL0qkIctJ3Kmmr?= =?us-ascii?Q?9gJN4MTLIFUyHygO7shmXGh4KBYP2KjQvPvMQRlO347vjjwx6iTqKbGGl5D4?= =?us-ascii?Q?riya2J2wKJcScnyOTiQBJg6kUMDu/zSDzgso3Ck3rRIkvOBSG4JM9nnx64++?= =?us-ascii?Q?NCM2MxtPhmh7+qeSgycne8b3EVZWVairJxMec5S+Jpk9bJ96kGE8FEnaxQ30?= =?us-ascii?Q?nH6JIfNW63u9Cc3J8FufOjeS+zbifBqpoe0CPCp9ZYIJdCvIRt8s4E4NFNlZ?= =?us-ascii?Q?ekjBKKjgcssNEsiDcQKtgqQ5dcZw4oh1OKZmwriDl2VraQGVALSBRPHUdr2Y?= =?us-ascii?Q?PAg0mNok6/uub/lpiDyCGGe1FvqCy+V0O4Ua4vg2lloEv/9rrcn1AFKZB/Z8?= =?us-ascii?Q?WY1yXUIl/8FV5QJBxdCAr03AxEdAHUA70yYS9WAN61KoNucwOq3wBhxczFVS?= =?us-ascii?Q?FgMCzQ4lewjElIcLZB8F7v6bk7+TxTYD/VQEaP5d7hNPMl66ey1jzvU/rhC+?= =?us-ascii?Q?8HDpuKV5QME2am6Buxz8xwsiu7mEOwTPK3ddcUzJrb4iMT66M7ae0v7o4jEZ?= =?us-ascii?Q?NcnFKVDsDIDH1aZr9Et8eyg/qeiV5PC7hnDVpiQ4po4HsGyx0GlLyPe0DvIk?= =?us-ascii?Q?ywebr+J2btKcTv9nst2pJz7A50CjWbNiDb1FvJpzFImz3z6SFRfb2DtZZTTQ?= =?us-ascii?Q?XDJK1R6I1KTjKYBozfBig2GN/lmyDBm123r0r6w0ELV0Cr2vz5J9zhfPwBn+?= =?us-ascii?Q?DV2ArwXiG+7b3XQwUJtMvGAN7dVsdxFLEsAr+0+z5PzmDS6Oxfy/n065NPJC?= =?us-ascii?Q?BJrPHuM=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SJ0PR13MB5545.namprd13.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230037)(376011)(52116011)(366013)(1800799021)(38350700011); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?wsdeqSySfxi0N/X/55NXOUEe9OuAPkO6R4qUUuzc6mwTwZAX3c63pLm2UXbx?= =?us-ascii?Q?FiqNo7kXDHs9DSQjrfht92udsyU9BynxIgIrx4AIrztp1duCnV/kuqm3FZ4B?= =?us-ascii?Q?pQHuKQC2b1egdQcEavvXf4Lq8BWYoR6kfKo8Nu4HTgK8Jqbo36JyxsNEEUWF?= =?us-ascii?Q?JhrXQNBqzeWIXNsFIGXFDV6oHJ6PBAPZD2UXevy+ugu3PHwDWAzJUmJzfe72?= =?us-ascii?Q?lPd38QyJHTJEJghPzNc1iv6WtKP7vZNRZcw3oK+wEshC3EF7ikpWNGIlu6iB?= =?us-ascii?Q?23rmrkJCBgWFfPpcqtxJ5aQQKE/Q8JWS+1AbfO9gtv1Z8reNwizghV9RyiF2?= =?us-ascii?Q?d+duftZO0P0NhwstyeBWznq3i3NtHR3f3Jh1mispvmE77DquNSPIPZF6JLLf?= =?us-ascii?Q?cqCLe7veWM70Fu6qZhNTH69tlOjF7Oj2rjVo/GykACd268yvqkgsrxdlNtyO?= =?us-ascii?Q?yGElnmc2xFFytgCGFJrOx/lVzb3KSIUEStj2xHdTokMTTtWyHeG+j/LexE1x?= =?us-ascii?Q?0mETpy+6YwMdaASqz6AUrHUSyoYzPFdMyRoHSPHzq1p7sFQqc5pfi52RMr5S?= =?us-ascii?Q?KiDIefQbtd3O6OhrN4FZZc8JNB5sXuxg6vwm8yAuGtdZhrVnSvJJ09Zih1tv?= =?us-ascii?Q?1uCQp1rDYOVBbviY2RiIvz7NAJBuIBxWh5VI+v7SNEt16Kus/UDzVz7Q/Gea?= =?us-ascii?Q?+R2ZRSe+9H6NhJkezEVTiD/U8JPPec/4XEu6Lpzw6J4SbvdzXzFn7XqVXjc9?= =?us-ascii?Q?BsXvXJmV2PXhv9gnJYDk9QLrxUQxgOKpquDPR1CrZAVA42n1i1ydVqOUR0zK?= =?us-ascii?Q?fYwIK7j5cn4qVvdmlQ+WjLlyt9I4kMu6+dBdntgzDmOod2nLF0nxu0Psstar?= =?us-ascii?Q?8wELN/sL5K0WK53lpZ7DvwDr2ZVu0OvzLa5gJEuZQ40Sen/IpI+QCNu5Dlnd?= =?us-ascii?Q?irfNkEgt/sEYoqU1kuw9d+Hg393hRcJAq8Zg7mk3qPBG39ONe877GyZInEY2?= =?us-ascii?Q?CIyrgPytJmucrvMhQjaH2PmHr+HU53pgtOLVaxutEn7EcTHWuWclSQu/ot9b?= =?us-ascii?Q?mVLXHVTHdeUio5QLRxmpHCRz2tifuZ7if7b6QfJUVT+GBHJ4tNGXYCFdf5pB?= =?us-ascii?Q?Fg/xNs2zU4akCJkFGWDEkHjzqlvy5CiVhGmDZBlNl6JyoXyOih/h3pqFLIoQ?= =?us-ascii?Q?gxwogDEbCxVuaKcVnfjR0zCXN1XTGQsSr4jSOPpuF/YP+leQ/6uE38Pmgna1?= =?us-ascii?Q?UH/9QFQTtNLGLwwc3L0Ac+rMEJM1c5RpjhxpmwFh1rhzzbe32RqsE8Te9apy?= =?us-ascii?Q?fJE8qJPLZ2vnNlW6uIyECTikksrDhFRFxua4APyOYg5CGkr5H9rZm4+OsqeB?= =?us-ascii?Q?xDzoS5BvB9E/FDMq8V5GszWHYvN3EekcfjX11+jdtaK2Sps8d3g7aGF3R0cL?= =?us-ascii?Q?/6jhF2KhllvFvECio+kLMZd4r2ytTpa7zQAtpCsY4Wc8BupKtHNcjam+4Gcx?= =?us-ascii?Q?i+XqKyqhgNjefNIfilaFsYpj+FC9ZKoSkpGs8M4l9jOUw+J8wIW5BfVVJT6s?= =?us-ascii?Q?gRXv0RcWXgj99iROv/FmzMvxTLwKZ6Eg7b9b3JgdKsSBOk168IdrLQ4mCb1b?= =?us-ascii?Q?jA=3D=3D?= X-OriginatorOrg: corigine.com X-MS-Exchange-CrossTenant-Network-Message-Id: c6e8772f-e5c0-4c5e-fc58-08dc900bd958 X-MS-Exchange-CrossTenant-AuthSource: SJ0PR13MB5545.namprd13.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Jun 2024 02:59:34.8108 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: fe128f2c-073b-4c20-818e-7246a585940c X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 6WANOoZeOx0dc2K9FvX4yKfZnTEqNPEePBaeYDk+c+vAkaaD6q4Plgzgpr9g/wswY8hkv0diqur9EP0xwe+/CLkedv4cMKUhyoq5K9Pir3s= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN4PR13MB5775 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org From: Long Wu Use AVX2 instructions to accelerate Tx performance. The acceleration only works on X86 machine. Signed-off-by: Long Wu Reviewed-by: Chaoyong He --- drivers/net/nfp/meson.build | 15 + drivers/net/nfp/nfdk/nfp_nfdk.h | 1 + drivers/net/nfp/nfdk/nfp_nfdk_dp.c | 12 + drivers/net/nfp/nfdk/nfp_nfdk_vec.h | 36 ++ drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c | 432 ++++++++++++++++++++ drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c | 14 + drivers/net/nfp/nfp_ethdev.c | 3 +- drivers/net/nfp/nfp_ethdev_vf.c | 3 +- drivers/net/nfp/nfp_rxtx.h | 5 +- drivers/net/nfp/nfp_rxtx_vec.h | 13 + drivers/net/nfp/nfp_rxtx_vec_avx2.c | 21 + drivers/net/nfp/nfp_rxtx_vec_stub.c | 16 + 12 files changed, 568 insertions(+), 3 deletions(-) create mode 100644 drivers/net/nfp/nfdk/nfp_nfdk_vec.h create mode 100644 drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c create mode 100644 drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c create mode 100644 drivers/net/nfp/nfp_rxtx_vec.h create mode 100644 drivers/net/nfp/nfp_rxtx_vec_avx2.c create mode 100644 drivers/net/nfp/nfp_rxtx_vec_stub.c diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build index d805644ec5..eb54df5348 100644 --- a/drivers/net/nfp/meson.build +++ b/drivers/net/nfp/meson.build @@ -16,6 +16,7 @@ sources = files( 'flower/nfp_flower_service.c', 'nfd3/nfp_nfd3_dp.c', 'nfdk/nfp_nfdk_dp.c', + 'nfdk/nfp_nfdk_vec_stub.c', 'nfpcore/nfp_cppcore.c', 'nfpcore/nfp_crc.c', 'nfpcore/nfp_elf.c', @@ -43,7 +44,21 @@ sources = files( 'nfp_net_flow.c', 'nfp_net_meta.c', 'nfp_rxtx.c', + 'nfp_rxtx_vec_stub.c', 'nfp_service.c', ) +if arch_subdir == 'x86' + avx2_sources = files( + 'nfdk/nfp_nfdk_vec_avx2_dp.c', + 'nfp_rxtx_vec_avx2.c', + ) + nfp_avx2_lib = static_library('nfp_avx2_lib', + avx2_sources, + dependencies: [static_rte_ethdev, static_rte_bus_pci, + static_rte_common_nfp], + c_args: [cflags, '-mavx2']) + objs += nfp_avx2_lib.extract_all_objects(recursive: true) +endif + deps += ['hash', 'security', 'common_nfp'] diff --git a/drivers/net/nfp/nfdk/nfp_nfdk.h b/drivers/net/nfp/nfdk/nfp_nfdk.h index 89a98d13f3..29d862f6f0 100644 --- a/drivers/net/nfp/nfdk/nfp_nfdk.h +++ b/drivers/net/nfp/nfdk/nfp_nfdk.h @@ -222,5 +222,6 @@ int nfp_net_nfdk_tx_maybe_close_block(struct nfp_net_txq *txq, int nfp_net_nfdk_set_meta_data(struct rte_mbuf *pkt, struct nfp_net_txq *txq, uint64_t *metadata); +void nfp_net_nfdk_xmit_pkts_set(struct rte_eth_dev *eth_dev); #endif /* __NFP_NFDK_H__ */ diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c index 77021aa612..e6847ccb33 100644 --- a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c +++ b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c @@ -11,6 +11,8 @@ #include "../flower/nfp_flower.h" #include "../nfp_logs.h" #include "../nfp_net_meta.h" +#include "../nfp_rxtx_vec.h" +#include "nfp_nfdk_vec.h" #define NFDK_TX_DESC_GATHER_MAX 17 @@ -517,6 +519,7 @@ nfp_net_nfdk_tx_queue_setup(struct rte_eth_dev *dev, dev->data->tx_queues[queue_idx] = txq; txq->hw = hw; txq->hw_priv = dev->process_private; + txq->simple_always = true; /* * Telling the HW about the physical address of the TX ring and number @@ -527,3 +530,12 @@ nfp_net_nfdk_tx_queue_setup(struct rte_eth_dev *dev, return 0; } + +void +nfp_net_nfdk_xmit_pkts_set(struct rte_eth_dev *eth_dev) +{ + if (nfp_net_get_avx2_supported()) + eth_dev->tx_pkt_burst = nfp_net_nfdk_vec_avx2_xmit_pkts; + else + eth_dev->tx_pkt_burst = nfp_net_nfdk_xmit_pkts; +} diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_vec.h b/drivers/net/nfp/nfdk/nfp_nfdk_vec.h new file mode 100644 index 0000000000..14319d6cf6 --- /dev/null +++ b/drivers/net/nfp/nfdk/nfp_nfdk_vec.h @@ -0,0 +1,36 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#ifndef __NFP_NFDK_VEC_H__ +#define __NFP_NFDK_VEC_H__ + +#include + +#include + +#include "../nfp_net_common.h" +#include "nfp_nfdk.h" + +static inline bool +nfp_net_nfdk_is_simple_packet(struct rte_mbuf *pkt, + struct nfp_net_hw *hw) +{ + if (pkt->data_len > NFDK_TX_MAX_DATA_PER_HEAD) + return false; + + if ((hw->super.cap & NFP_NET_CFG_CTRL_LSO_ANY) == 0) + return true; + + if ((pkt->ol_flags & RTE_MBUF_F_TX_TCP_SEG) == 0) + return true; + + return false; +} + +uint16_t nfp_net_nfdk_vec_avx2_xmit_pkts(void *tx_queue, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts); + +#endif /* __NFP_NFDK_VEC_H__ */ diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c b/drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c new file mode 100644 index 0000000000..6d1359fdb1 --- /dev/null +++ b/drivers/net/nfp/nfdk/nfp_nfdk_vec_avx2_dp.c @@ -0,0 +1,432 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include +#include +#include + +#include "../nfp_logs.h" +#include "nfp_nfdk.h" +#include "nfp_nfdk_vec.h" + +/* + * One simple packet needs 2 descriptors so if send 4 packets driver will use + * 8 descriptors at once. + */ +#define NFDK_SIMPLE_BURST_DES_NUM 8 + +#define NFDK_SIMPLE_DES_TYPE (NFDK_DESC_TX_EOP | \ + (NFDK_DESC_TX_TYPE_HEAD & (NFDK_DESC_TX_TYPE_SIMPLE << 12))) + +static inline int +nfp_net_nfdk_vec_avx2_xmit_simple_set_des2(struct rte_mbuf *pkt, + struct nfp_net_txq *txq, + uint64_t *des_addr, + uint64_t *des_meta, + bool repr_flag) +{ + int ret; + __m128i dma_addr; + __m128i dma_hi; + __m128i data_off; + __m128i dlen_type; + uint64_t metadata; + + if (repr_flag) { + metadata = NFDK_DESC_TX_CHAIN_META; + } else { + ret = nfp_net_nfdk_set_meta_data(pkt, txq, &metadata); + if (unlikely(ret != 0)) + return ret; + } + + data_off = _mm_set_epi64x(0, pkt->data_off); + dma_addr = _mm_add_epi64(_mm_loadu_si128((__m128i *)&pkt->buf_addr), data_off); + dma_hi = _mm_srli_epi64(dma_addr, 32); + + dlen_type = _mm_set_epi64x(0, (pkt->data_len - 1) | NFDK_SIMPLE_DES_TYPE); + + *des_addr = _mm_extract_epi64(_mm_add_epi64(_mm_unpacklo_epi32(dma_hi, dma_addr), + _mm_slli_epi64(dlen_type, 16)), 0); + + *des_meta = nfp_net_nfdk_tx_cksum(txq, pkt, metadata); + + return 0; +} + +static inline int +nfp_net_nfdk_vec_avx2_xmit_simple_send1(struct nfp_net_txq *txq, + struct nfp_net_nfdk_tx_desc *txds, + struct rte_mbuf *pkt, + bool repr_flag) +{ + int ret; + __m128i des_data; + uint64_t des_addr; + uint64_t des_meta; + + ret = nfp_net_nfdk_vec_avx2_xmit_simple_set_des2(pkt, txq, &des_addr, + &des_meta, repr_flag); + if (unlikely(ret != 0)) + return ret; + + txq->wr_p = D_IDX(txq, txq->wr_p + NFDK_TX_DESC_PER_SIMPLE_PKT); + if ((txq->wr_p & (NFDK_TX_DESC_BLOCK_CNT - 1)) != 0) + txq->data_pending += pkt->data_len; + else + txq->data_pending = 0; + + des_data = _mm_set_epi64x(des_meta, des_addr); + + _mm_store_si128((void *)txds, des_data); + + return 0; +} + +static inline int +nfp_vec_avx2_nfdk_xmit_simple_send4(struct nfp_net_txq *txq, + struct nfp_net_nfdk_tx_desc *txds, + struct rte_mbuf **pkt, + bool repr_flag) +{ + int ret; + uint16_t i; + __m256i des_data0_1; + __m256i des_data2_3; + uint64_t des_addr[4]; + uint64_t des_meta[4]; + + for (i = 0; i < 4; i++) { + ret = nfp_net_nfdk_vec_avx2_xmit_simple_set_des2(pkt[i], txq, + &des_addr[i], &des_meta[i], repr_flag); + if (unlikely(ret != 0)) + return ret; + } + + for (i = 0; i < 4; i++) { + txq->wr_p = D_IDX(txq, txq->wr_p + NFDK_TX_DESC_PER_SIMPLE_PKT); + if ((txq->wr_p & (NFDK_TX_DESC_BLOCK_CNT - 1)) != 0) + txq->data_pending += pkt[i]->data_len; + else + txq->data_pending = 0; + } + + des_data0_1 = _mm256_set_epi64x(des_meta[1], des_addr[1], des_meta[0], des_addr[0]); + des_data2_3 = _mm256_set_epi64x(des_meta[3], des_addr[3], des_meta[2], des_addr[2]); + + _mm256_store_si256((void *)txds, des_data0_1); + _mm256_store_si256((void *)(txds + 4), des_data2_3); + + return 0; +} + +static inline void +nfp_net_nfdk_vec_avx2_xmit_mbuf_store4(struct rte_mbuf **mbuf, + struct rte_mbuf **tx_pkts) +{ + __m256i mbuf_room0_1; + __m256i mbuf_room2_3; + + mbuf_room0_1 = _mm256_set_epi64x(0, (uintptr_t)tx_pkts[1], 0, + (uintptr_t)tx_pkts[0]); + mbuf_room2_3 = _mm256_set_epi64x(0, (uintptr_t)tx_pkts[3], 0, + (uintptr_t)tx_pkts[2]); + + _mm256_store_si256((void *)mbuf, mbuf_room0_1); + _mm256_store_si256((void *)(mbuf + 4), mbuf_room2_3); +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_simple_pkts(struct nfp_net_txq *txq, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts, + uint16_t simple_close, + bool repr_flag) +{ + int ret; + uint16_t npkts = 0; + uint16_t need_txds; + uint16_t free_descs; + struct rte_mbuf **lmbuf; + struct nfp_net_nfdk_tx_desc *ktxds; + + PMD_TX_LOG(DEBUG, "Working for queue %hu at pos %u and %hu packets", + txq->qidx, txq->wr_p, nb_pkts); + + need_txds = nb_pkts << 1; + if (nfp_net_nfdk_free_tx_desc(txq) < need_txds || nfp_net_nfdk_txq_full(txq)) + nfp_net_tx_free_bufs(txq); + + free_descs = nfp_net_nfdk_free_tx_desc(txq); + if (unlikely(free_descs < NFDK_TX_DESC_PER_SIMPLE_PKT)) { + if (unlikely(simple_close > 0)) + goto xmit_end; + + return 0; + } + + PMD_TX_LOG(DEBUG, "Queue: %hu. Sending %hu packets", txq->qidx, nb_pkts); + + /* Sending packets */ + while (npkts < nb_pkts && free_descs >= NFDK_TX_DESC_PER_SIMPLE_PKT) { + ktxds = &txq->ktxds[txq->wr_p]; + lmbuf = &txq->txbufs[txq->wr_p].mbuf; + + /* + * If can not send burst, just send one. + * 1. Tx ring will come to the tail. + * 2. Do not need to send 4 packets. + * 3. If pointer address unaligned on 32-bit boundary. + * 4. If free descriptors are not enough. + */ + if ((txq->tx_count - txq->wr_p) < NFDK_SIMPLE_BURST_DES_NUM || + (nb_pkts - npkts) < 4 || + ((uintptr_t)ktxds & 0x1F) != 0 || + free_descs < NFDK_SIMPLE_BURST_DES_NUM) { + ret = nfp_net_nfdk_vec_avx2_xmit_simple_send1(txq, + ktxds, tx_pkts[npkts], repr_flag); + if (unlikely(ret != 0)) + goto xmit_end; + + rte_pktmbuf_free(*lmbuf); + + _mm_storel_epi64((void *)lmbuf, + _mm_loadu_si128((void *)&tx_pkts[npkts])); + npkts++; + free_descs -= NFDK_TX_DESC_PER_SIMPLE_PKT; + continue; + } + + ret = nfp_vec_avx2_nfdk_xmit_simple_send4(txq, ktxds, + &tx_pkts[npkts], repr_flag); + if (unlikely(ret != 0)) + goto xmit_end; + + rte_pktmbuf_free_bulk(lmbuf, NFDK_SIMPLE_BURST_DES_NUM); + + nfp_net_nfdk_vec_avx2_xmit_mbuf_store4(lmbuf, &tx_pkts[npkts]); + + npkts += 4; + free_descs -= NFDK_SIMPLE_BURST_DES_NUM; + } + +xmit_end: + /* Increment write pointers. Force memory write before we let HW know */ + rte_wmb(); + nfp_qcp_ptr_add(txq->qcp_q, NFP_QCP_WRITE_PTR, ((npkts << 1) + simple_close)); + + return npkts; +} + +static inline void +nfp_net_nfdk_vec_avx2_xmit_simple_close_block(struct nfp_net_txq *txq, + uint16_t *simple_close) +{ + uint16_t i; + uint16_t wr_p; + uint16_t nop_slots; + __m128i zero_128 = _mm_setzero_si128(); + __m256i zero_256 = _mm256_setzero_si256(); + + wr_p = txq->wr_p; + nop_slots = D_BLOCK_CPL(wr_p); + + for (i = nop_slots; i >= 4; i -= 4, wr_p += 4) { + _mm256_store_si256((void *)&txq->ktxds[wr_p], zero_256); + rte_pktmbuf_free_bulk(&txq->txbufs[wr_p].mbuf, 4); + _mm256_store_si256((void *)&txq->txbufs[wr_p], zero_256); + } + + for (; i >= 2; i -= 2, wr_p += 2) { + _mm_store_si128((void *)&txq->ktxds[wr_p], zero_128); + rte_pktmbuf_free_bulk(&txq->txbufs[wr_p].mbuf, 2); + _mm_store_si128((void *)&txq->txbufs[wr_p], zero_128); + } + + for (; i >= 1; i--, wr_p++) { + _mm_storel_epi64((void *)&txq->ktxds[wr_p], zero_128); + rte_pktmbuf_free(txq->txbufs[wr_p].mbuf); + _mm_storel_epi64((void *)&txq->txbufs[wr_p], zero_128); + } + + txq->data_pending = 0; + txq->wr_p = D_IDX(txq, txq->wr_p + nop_slots); + + (*simple_close) += nop_slots; +} + +static inline uint32_t +nfp_net_nfdk_vec_avx2_xmit_simple_prepare(struct nfp_net_txq *txq, + uint16_t *simple_close) +{ + uint16_t wr_p; + __m128i zero_128 = _mm_setzero_si128(); + + wr_p = txq->wr_p; + + _mm_storel_epi64((void *)&txq->ktxds[wr_p], zero_128); + rte_pktmbuf_free(txq->txbufs[wr_p].mbuf); + _mm_storel_epi64((void *)&txq->txbufs[wr_p], zero_128); + + txq->wr_p = D_IDX(txq, wr_p + 1); + (*simple_close)++; + + return txq->wr_p; +} + +static inline void +nfp_net_nfdk_vec_avx2_xmit_simple_check(struct nfp_net_txq *txq, + struct rte_mbuf *pkt, + bool *simple_flag, + bool *pending_flag, + uint16_t *data_pending, + uint32_t *wr_p, + uint16_t *simple_close) +{ + uint32_t data_pending_temp; + + /* Let the first descriptor index even before send simple packets */ + if (!(*simple_flag)) { + if ((*wr_p & 0x1) == 0x1) + *wr_p = nfp_net_nfdk_vec_avx2_xmit_simple_prepare(txq, simple_close); + + *simple_flag = true; + } + + /* Simple packets only need one close block operation */ + if (!(*pending_flag)) { + if ((*wr_p & (NFDK_TX_DESC_BLOCK_CNT - 1)) == 0) { + *pending_flag = true; + return; + } + + data_pending_temp = *data_pending + pkt->data_len; + if (data_pending_temp > NFDK_TX_MAX_DATA_PER_BLOCK) { + nfp_net_nfdk_vec_avx2_xmit_simple_close_block(txq, simple_close); + *pending_flag = true; + return; + } + + *data_pending = data_pending_temp; + + *wr_p += 2; + } +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_simple_count(struct nfp_net_txq *txq, + struct rte_mbuf **tx_pkts, + uint16_t head, + uint16_t nb_pkts, + uint16_t *simple_close) +{ + uint32_t wr_p; + uint16_t simple_idx; + struct rte_mbuf *pkt; + uint16_t data_pending; + bool simple_flag = false; + bool pending_flag = false; + uint16_t simple_count = 0; + + *simple_close = 0; + wr_p = txq->wr_p; + data_pending = txq->data_pending; + + for (simple_idx = head; simple_idx < nb_pkts; simple_idx++) { + pkt = tx_pkts[simple_idx]; + if (!nfp_net_nfdk_is_simple_packet(pkt, txq->hw)) + break; + + simple_count++; + if (!txq->simple_always) + nfp_net_nfdk_vec_avx2_xmit_simple_check(txq, pkt, &simple_flag, + &pending_flag, &data_pending, &wr_p, simple_close); + } + + return simple_count; +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_others_count(struct nfp_net_txq *txq, + struct rte_mbuf **tx_pkts, + uint16_t head, + uint16_t nb_pkts) +{ + uint16_t others_idx; + struct rte_mbuf *pkt; + uint16_t others_count = 0; + + for (others_idx = head; others_idx < nb_pkts; others_idx++) { + pkt = tx_pkts[others_idx]; + if (nfp_net_nfdk_is_simple_packet(pkt, txq->hw)) + break; + + others_count++; + } + + return others_count; +} + +static inline uint16_t +nfp_net_nfdk_vec_avx2_xmit_common(void *tx_queue, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + uint16_t i; + uint16_t avail = 0; + uint16_t simple_close; + uint16_t simple_count; + uint16_t simple_avail; + uint16_t others_count; + uint16_t others_avail; + struct nfp_net_txq *txq = tx_queue; + + for (i = 0; i < nb_pkts; i++) { + simple_count = nfp_net_nfdk_vec_avx2_xmit_simple_count(txq, tx_pkts, i, + nb_pkts, &simple_close); + if (simple_count > 0) { + if (!txq->simple_always) + txq->simple_always = true; + + simple_avail = nfp_net_nfdk_vec_avx2_xmit_simple_pkts(txq, + tx_pkts + i, simple_count, simple_close, + false); + + avail += simple_avail; + if (simple_avail != simple_count) + break; + + i += simple_count; + } + + if (i == nb_pkts) + break; + + others_count = nfp_net_nfdk_vec_avx2_xmit_others_count(txq, tx_pkts, + i, nb_pkts); + + if (txq->simple_always) + txq->simple_always = false; + + others_avail = nfp_net_nfdk_xmit_pkts_common(tx_queue, + tx_pkts + i, others_count, false); + + avail += others_avail; + if (others_avail != others_count) + break; + + i += others_count; + } + + return avail; +} + +uint16_t +nfp_net_nfdk_vec_avx2_xmit_pkts(void *tx_queue, + struct rte_mbuf **tx_pkts, + uint16_t nb_pkts) +{ + return nfp_net_nfdk_vec_avx2_xmit_common(tx_queue, tx_pkts, nb_pkts); +} diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c b/drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c new file mode 100644 index 0000000000..146ec21d51 --- /dev/null +++ b/drivers/net/nfp/nfdk/nfp_nfdk_vec_stub.c @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include "nfp_nfdk_vec.h" + +uint16_t __rte_weak +nfp_net_nfdk_vec_avx2_xmit_pkts(__rte_unused void *tx_queue, + __rte_unused struct rte_mbuf **tx_pkts, + __rte_unused uint16_t nb_pkts) +{ + return 0; +} diff --git a/drivers/net/nfp/nfp_ethdev.c b/drivers/net/nfp/nfp_ethdev.c index 004c725ef8..acf9a73690 100644 --- a/drivers/net/nfp/nfp_ethdev.c +++ b/drivers/net/nfp/nfp_ethdev.c @@ -27,6 +27,7 @@ #include "nfp_ipsec.h" #include "nfp_logs.h" #include "nfp_net_flow.h" +#include "nfp_rxtx_vec.h" /* 64-bit per app capabilities */ #define NFP_NET_APP_CAP_SP_INDIFF RTE_BIT64(0) /* Indifferent to port speed */ @@ -887,7 +888,7 @@ nfp_net_ethdev_ops_mount(struct nfp_net_hw *hw, if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3) eth_dev->tx_pkt_burst = nfp_net_nfd3_xmit_pkts; else - eth_dev->tx_pkt_burst = nfp_net_nfdk_xmit_pkts; + nfp_net_nfdk_xmit_pkts_set(eth_dev); eth_dev->dev_ops = &nfp_net_eth_dev_ops; eth_dev->rx_queue_count = nfp_net_rx_queue_count; diff --git a/drivers/net/nfp/nfp_ethdev_vf.c b/drivers/net/nfp/nfp_ethdev_vf.c index a422bcd057..63ea0a5d17 100644 --- a/drivers/net/nfp/nfp_ethdev_vf.c +++ b/drivers/net/nfp/nfp_ethdev_vf.c @@ -14,6 +14,7 @@ #include "nfp_logs.h" #include "nfp_net_common.h" +#include "nfp_rxtx_vec.h" #define NFP_VF_DRIVER_NAME net_nfp_vf @@ -240,7 +241,7 @@ nfp_netvf_ethdev_ops_mount(struct nfp_net_hw *hw, if (hw->ver.extend == NFP_NET_CFG_VERSION_DP_NFD3) eth_dev->tx_pkt_burst = nfp_net_nfd3_xmit_pkts; else - eth_dev->tx_pkt_burst = nfp_net_nfdk_xmit_pkts; + nfp_net_nfdk_xmit_pkts_set(eth_dev); eth_dev->dev_ops = &nfp_netvf_eth_dev_ops; eth_dev->rx_queue_count = nfp_net_rx_queue_count; diff --git a/drivers/net/nfp/nfp_rxtx.h b/drivers/net/nfp/nfp_rxtx.h index 9806384a63..3ddf717da0 100644 --- a/drivers/net/nfp/nfp_rxtx.h +++ b/drivers/net/nfp/nfp_rxtx.h @@ -69,9 +69,12 @@ struct __rte_aligned(64) nfp_net_txq { /** Used by NFDk only */ uint16_t data_pending; + /** Used by NFDk vector xmit only */ + bool simple_always; + /** * At this point 58 bytes have been used for all the fields in the - * TX critical path. We have room for 6 bytes and still all placed + * TX critical path. We have room for 5 bytes and still all placed * in a cache line. */ uint64_t dma; diff --git a/drivers/net/nfp/nfp_rxtx_vec.h b/drivers/net/nfp/nfp_rxtx_vec.h new file mode 100644 index 0000000000..c92660f963 --- /dev/null +++ b/drivers/net/nfp/nfp_rxtx_vec.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#ifndef __NFP_RXTX_VEC_AVX2_H__ +#define __NFP_RXTX_VEC_AVX2_H__ + +#include + +bool nfp_net_get_avx2_supported(void); + +#endif /* __NFP_RXTX_VEC_AVX2_H__ */ diff --git a/drivers/net/nfp/nfp_rxtx_vec_avx2.c b/drivers/net/nfp/nfp_rxtx_vec_avx2.c new file mode 100644 index 0000000000..50638e74ab --- /dev/null +++ b/drivers/net/nfp/nfp_rxtx_vec_avx2.c @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include + +#include +#include + +#include "nfp_rxtx_vec.h" + +bool +nfp_net_get_avx2_supported(void) +{ + if (rte_vect_get_max_simd_bitwidth() >= RTE_VECT_SIMD_256 && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX2) == 1) + return true; + + return false; +} diff --git a/drivers/net/nfp/nfp_rxtx_vec_stub.c b/drivers/net/nfp/nfp_rxtx_vec_stub.c new file mode 100644 index 0000000000..1bc55b67e0 --- /dev/null +++ b/drivers/net/nfp/nfp_rxtx_vec_stub.c @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2024 Corigine, Inc. + * All rights reserved. + */ + +#include + +#include + +#include "nfp_rxtx_vec.h" + +bool __rte_weak +nfp_net_get_avx2_supported(void) +{ + return false; +} -- 2.39.1