From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 428C1A0548; Thu, 27 May 2021 16:03:10 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 400F241118; Thu, 27 May 2021 16:02:37 +0200 (CEST) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2070.outbound.protection.outlook.com [40.107.237.70]) by mails.dpdk.org (Postfix) with ESMTP id A28234110A for ; Thu, 27 May 2021 16:02:34 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=n8F7CHmy2Wa0Euj4fvXEp+TXvYksqsWxQZ4NS/no0e4hHp0MIF46VKAeAfzdeSUlvqdNwl7emmqGze9hJwi0I9sdwqH9nkRmw7sEs5HjcCCrsYEJPIzbiphU6J/ED4qnFf729ulxbrZQp2sPjEsBEeiPJ3cweVDbpuX/9HIMsocGKKOXhqka6U+6/K85sddWhYE2B2BJfhe+ZiTpZBumXowVD7XlKzRj7NHSP90n5RWrPHDD4mYS13gdKNym7F5OBOOgYD+tngc7ibUqNWvb0VNasuiZmAdXne5r1WA/ArIkUttPhlFA4lF86PMbFPfAXy4KKHxruR5VgRX8HIk7+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GYvdj8bOU7LnODTDYHYHaMjOSft4z4DU52JYvpTFeek=; b=jTURbFqZwoL1ZBQEDJoWsbfk6VAAC0ws5Knkpnry6/C1I2Gq9xp9zfJHCl1XGr8/yMKLwyG4EEJ4C0zCQUaDoBIZDAeAB3gb9QzTKoO1pIjnCkwSanh71EM0DAmkl1PU05XVQAAFFAjO+hp6ApTdemDn12XCbX/FLiSc1WsFF1iNQwDa2mqFJ68++0fnb/qz4ZsEGHAU2IEdEwcjRk9c/Kxtyd8PChNSz87tEavnRey2PxNp2+mLh0Ju39QNa7YO9AWIACiYAM1QocyXGfipWUbVbxfBPstacY/oc3EsbZGzAwZXYFwd2bKPKqtNRmuXbZX+RmrOgL+9zIpgKW6d6g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.34) smtp.rcpttodomain=intel.com smtp.mailfrom=nvidia.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GYvdj8bOU7LnODTDYHYHaMjOSft4z4DU52JYvpTFeek=; b=fBBqdGZb211cwV6KyefO9YgZ00MnLsG3EttmfjOT5wLoMjOUa52cCixlRKPia35B1GWTFbibgxgl8gJMHXbVt0ilJlkERyXkIWGPqK7oAuSmXfgwqDaYbN7bBVnmQvEhhiivYMdb2jtCql4uKuvqwsiThe/6XJVdutV64Dk1qA4lF9Xa/UCDkztnv+8vu4R/wEPesjv9QLX7laJmp7fJzuG5XqzbqJWlaqiJV5LDSkGNItaWZ2/Tti6wq3SgCNF7E1GCXKtSHZekNEzNc8L9d9i11GRIXJ69B4iA0YYoSVKCOVwvKgtIMNfiC5y/9mEE+17D4+gMGRVKJp0a7mT+kg== Received: from BN1PR10CA0005.namprd10.prod.outlook.com (2603:10b6:408:e0::10) by MN2PR12MB3599.namprd12.prod.outlook.com (2603:10b6:208:d2::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4173.21; Thu, 27 May 2021 14:02:32 +0000 Received: from BN8NAM11FT039.eop-nam11.prod.protection.outlook.com (2603:10b6:408:e0:cafe::55) by BN1PR10CA0005.outlook.office365.com (2603:10b6:408:e0::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4173.20 via Frontend Transport; Thu, 27 May 2021 14:02:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.34) smtp.mailfrom=nvidia.com; intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.34 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.34; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.34) by BN8NAM11FT039.mail.protection.outlook.com (10.13.177.169) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4129.25 via Frontend Transport; Thu, 27 May 2021 14:02:31 +0000 Received: from nvidia.com (172.20.145.6) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 27 May 2021 14:02:25 +0000 From: Xueming Li To: Viacheslav Ovsiienko CC: , , Matan Azrad , Shahaf Shuler , Anatoly Burakov Date: Thu, 27 May 2021 17:02:02 +0300 Message-ID: <20210527140202.19377-5-xuemingl@nvidia.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210527140202.19377-1-xuemingl@nvidia.com> References: <20210527133759.17401-1-xuemingl@nvidia.com> <20210527140202.19377-1-xuemingl@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [172.20.145.6] X-ClientProxiedBy: HQMAIL111.nvidia.com (172.20.187.18) To HQMAIL107.nvidia.com (172.20.187.13) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b8a4f93e-c44b-48ed-a723-08d921181246 X-MS-TrafficTypeDiagnostic: MN2PR12MB3599: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:2331; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: aMDJhX1pBsClsrcSIk46M7tsi8YlvTeDSFc1KADmmYEvSCQroDsTMW43xxosdWEMeSPvLfyNQnAOBOJHagMIFP519jJaXNwx6/wFsIC47htO9S8QKu8W4of6XYAmpbU9yEnKBeZCNE6OS0l2lmnCr2qncGuvbaStbre0iqIZu4yRduZnistyVutWzifrZPSFemDV4oIK1Hh8gzox1hjnshReCumxswtcJXSl15SEMLji1OmEVpEsinbhia9zh1NSsBje3edzOBbn13k/SHvb6URIxMI1eRG1gxc3BzbdYJdxFO6roBvwAorzTpfSpVV62M2UNfdKDWYyH0ow8oca6Gn9dMR7y4S3aHShCqXdweQ1BEUwqV9EGnexKTo/S7vZdbm8REvrX8GEKQQsEaFtMCOdhLNLAtdkXfybqUxyMW+x/YAx71/uSEKD3Li+wMoL9pio33AVOxMZywefRHMJ4GyGyvOpJbGzGfFazZdjfbxxGD13nMwGmRsP7BioCmHWsojwhoTA/pwNz6DLD5boSC3SxBbExkDLn5dYnnfrxqnO2tACKR3x3yhttjXcg4llbRy7YHdx8eKWX/CXwENozPvEFS1u6hU88nmsJFfJ7CxxuUNFiVSM7iT0t+G76YYgIpnYjTWJZhiUa9c4HtqDdNcYgJMsI/bwYw8Gsdyaaas= X-Forefront-Antispam-Report: CIP:216.228.112.34; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid03.nvidia.com; CAT:NONE; SFS:(4636009)(346002)(136003)(39860400002)(396003)(376002)(36840700001)(46966006)(47076005)(30864003)(26005)(16526019)(6286002)(186003)(7696005)(6862004)(36756003)(8676002)(6636002)(478600001)(5660300002)(86362001)(316002)(36906005)(426003)(82740400003)(70206006)(2906002)(55016002)(70586007)(82310400003)(1076003)(7636003)(8936002)(83380400001)(6666004)(36860700001)(19627235002)(336012)(54906003)(2616005)(4326008)(37006003)(356005); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 27 May 2021 14:02:31.7288 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b8a4f93e-c44b-48ed-a723-08d921181246 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.34]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT039.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN2PR12MB3599 Subject: [dpdk-dev] [RFC 14/14] net/mlx5: support SubFunction X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patch introduces SF support. Similar to VF, SF on auxiliary bus is a portion of hardware PF, no representor or bonding parameters for SF. Devargs to support SF: -a auxiliary:mlx5_core.sf.8,dv_flow_en=1 New global syntax to support SF: -a bus=auxiliary,name=mlx5_core.sf.8/class=eth/driver=mlx5,dv_flow_en=1 Signed-off-by: Xueming Li --- doc/guides/nics/mlx5.rst | 339 +++++++++++++++++++++++- drivers/net/mlx5/linux/mlx5_ethdev_os.c | 12 +- drivers/net/mlx5/linux/mlx5_os.c | 142 +++++++--- drivers/net/mlx5/linux/mlx5_os.h | 2 + drivers/net/mlx5/mlx5.c | 10 +- drivers/net/mlx5/mlx5.h | 1 + drivers/net/mlx5/mlx5_rxmode.c | 8 +- drivers/net/mlx5/mlx5_trigger.c | 2 +- 8 files changed, 452 insertions(+), 64 deletions(-) diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index 83299646dd..3f5692038c 100644 --- a/doc/guides/nics/mlx5.rst +++ b/doc/guides/nics/mlx5.rst @@ -403,6 +403,300 @@ Limitations - Hairpin between two ports could only manual binding and explicit Tx flow mode. For single port hairpin, all the combinations of auto/manual binding and explicit/implicit Tx flow mode could be supported. - Hairpin in switchdev SR-IOV mode is not supported till now. +- Meter: + +Limitations +----------- + +- Windows support: + + On Windows, the features are limited: + + - Promiscuous mode is not supported + - The following rules are supported: + + - IPv4/UDP with CVLAN filtering + - Unicast MAC filtering + +- For secondary process: + + - Forked secondary process not supported. + - External memory unregistered in EAL memseg list cannot be used for DMA + unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in + primary process and remapped to the same virtual address in secondary + process. If the external memory is registered by primary process but has + different virtual address in secondary process, unexpected error may happen. + +- When using Verbs flow engine (``dv_flow_en`` = 0), flow pattern without any + specific VLAN will match for VLAN packets as well: + + When VLAN spec is not specified in the pattern, the matching rule will be created with VLAN as a wild card. + Meaning, the flow rule:: + + flow create 0 ingress pattern eth / vlan vid is 3 / ipv4 / end ... + + Will only match vlan packets with vid=3. and the flow rule:: + + flow create 0 ingress pattern eth / ipv4 / end ... + + Will match any ipv4 packet (VLAN included). + +- When using Verbs flow engine (``dv_flow_en`` = 0), multi-tagged(QinQ) match is not supported. + +- When using DV flow engine (``dv_flow_en`` = 1), flow pattern with any VLAN specification will match only single-tagged packets unless the ETH item ``type`` field is 0x88A8 or the VLAN item ``has_more_vlan`` field is 1. + The flow rule:: + + flow create 0 ingress pattern eth / ipv4 / end ... + + Will match any ipv4 packet. + The flow rules:: + + flow create 0 ingress pattern eth / vlan / end ... + flow create 0 ingress pattern eth has_vlan is 1 / end ... + flow create 0 ingress pattern eth type is 0x8100 / end ... + + Will match single-tagged packets only, with any VLAN ID value. + The flow rules:: + + flow create 0 ingress pattern eth type is 0x88A8 / end ... + flow create 0 ingress pattern eth / vlan has_more_vlan is 1 / end ... + + Will match multi-tagged packets only, with any VLAN ID value. + +- A flow pattern with 2 sequential VLAN items is not supported. + +- VLAN pop offload command: + + - Flow rules having a VLAN pop offload command as one of their actions and + are lacking a match on VLAN as one of their items are not supported. + - The command is not supported on egress traffic in NIC mode. + +- VLAN push offload is not supported on ingress traffic in NIC mode. + +- VLAN set PCP offload is not supported on existing headers. + +- A multi segment packet must have not more segments than reported by dev_infos_get() + in tx_desc_lim.nb_seg_max field. This value depends on maximal supported Tx descriptor + size and ``txq_inline_min`` settings and may be from 2 (worst case forced by maximal + inline settings) to 58. + +- Flows with a VXLAN Network Identifier equal (or ends to be equal) + to 0 are not supported. + +- L3 VXLAN and VXLAN-GPE tunnels cannot be supported together with MPLSoGRE and MPLSoUDP. + +- Match on Geneve header supports the following fields only: + + - VNI + - OAM + - protocol type + - options length + +- Match on Geneve TLV option is supported on the following fields: + + - Class + - Type + - Length + - Data + + Only one Class/Type/Length Geneve TLV option is supported per shared device. + Class/Type/Length fields must be specified as well as masks. + Class/Type/Length specified masks must be full. + Matching Geneve TLV option without specifying data is not supported. + Matching Geneve TLV option with ``data & mask == 0`` is not supported. + +- VF: flow rules created on VF devices can only match traffic targeted at the + configured MAC addresses (see ``rte_eth_dev_mac_addr_add()``). + +- Match on GTP tunnel header item supports the following fields only: + + - v_pt_rsv_flags: E flag, S flag, PN flag + - msg_type + - teid + +- Match on GTP extension header only for GTP PDU session container (next + extension header type = 0x85). +- Match on GTP extension header is not supported in group 0. + +- No Tx metadata go to the E-Switch steering domain for the Flow group 0. + The flows within group 0 and set metadata action are rejected by hardware. + +.. note:: + + MAC addresses not already present in the bridge table of the associated + kernel network device will be added and cleaned up by the PMD when closing + the device. In case of ungraceful program termination, some entries may + remain present and should be removed manually by other means. + +- Buffer split offload is supported with regular Rx burst routine only, + no MPRQ feature or vectorized code can be engaged. + +- When Multi-Packet Rx queue is configured (``mprq_en``), a Rx packet can be + externally attached to a user-provided mbuf with having EXT_ATTACHED_MBUF in + ol_flags. As the mempool for the external buffer is managed by PMD, all the + Rx mbufs must be freed before the device is closed. Otherwise, the mempool of + the external buffers will be freed by PMD and the application which still + holds the external buffers may be corrupted. + +- If Multi-Packet Rx queue is configured (``mprq_en``) and Rx CQE compression is + enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is not fully + supported. Some Rx packets may not have PKT_RX_RSS_HASH. + +- IPv6 Multicast messages are not supported on VM, while promiscuous mode + and allmulticast mode are both set to off. + To receive IPv6 Multicast messages on VM, explicitly set the relevant + MAC address using rte_eth_dev_mac_addr_add() API. + +- To support a mixed traffic pattern (some buffers from local host memory, some + buffers from other devices) with high bandwidth, a mbuf flag is used. + + An application hints the PMD whether or not it should try to inline the + given mbuf data buffer. PMD should do the best effort to act upon this request. + + The hint flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE`` is dynamic, + registered by application with rte_mbuf_dynflag_register(). This flag is + purely driver-specific and declared in PMD specific header ``rte_pmd_mlx5.h``, + which is intended to be used by the application. + + To query the supported specific flags in runtime, + the function ``rte_pmd_mlx5_get_dyn_flag_names`` returns the array of + currently (over present hardware and configuration) supported specific flags. + The "not inline hint" feature operating flow is the following one: + + - application starts + - probe the devices, ports are created + - query the port capabilities + - if port supporting the feature is found + - register dynamic flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE`` + - application starts the ports + - on ``dev_start()`` PMD checks whether the feature flag is registered and + enables the feature support in datapath + - application might set the registered flag bit in ``ol_flags`` field + of mbuf being sent and PMD will handle ones appropriately. + +- The amount of descriptors in Tx queue may be limited by data inline settings. + Inline data require the more descriptor building blocks and overall block + amount may exceed the hardware supported limits. The application should + reduce the requested Tx size or adjust data inline settings with + ``txq_inline_max`` and ``txq_inline_mpw`` devargs keys. + +- To provide the packet send scheduling on mbuf timestamps the ``tx_pp`` + parameter should be specified. + When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME set on the packet + being sent it tries to synchronize the time of packet appearing on + the wire with the specified packet timestamp. It the specified one + is in the past it should be ignored, if one is in the distant future + it should be capped with some reasonable value (in range of seconds). + These specific cases ("too late" and "distant future") can be optionally + reported via device xstats to assist applications to detect the + time-related problems. + + The timestamp upper "too-distant-future" limit + at the moment of invoking the Tx burst routine + can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23. + Please note, for the testpmd txonly mode, + the limit is deduced from the expression:: + + (n_tx_descriptors / burst_size + 1) * inter_burst_gap + + There is no any packet reordering according timestamps is supposed, + neither within packet burst, nor between packets, it is an entirely + application responsibility to generate packets and its timestamps + in desired order. The timestamps can be put only in the first packet + in the burst providing the entire burst scheduling. + +- E-Switch decapsulation Flow: + + - can be applied to PF port only. + - must specify VF port action (packet redirection from PF to VF). + - optionally may specify tunnel inner source and destination MAC addresses. + +- E-Switch encapsulation Flow: + + - can be applied to VF ports only. + - must specify PF port action (packet redirection from VF to PF). + +- Raw encapsulation: + + - The input buffer, used as outer header, is not validated. + +- Raw decapsulation: + + - The decapsulation is always done up to the outermost tunnel detected by the HW. + - The input buffer, providing the removal size, is not validated. + - The buffer size must match the length of the headers to be removed. + +- ICMP(code/type/identifier/sequence number) / ICMP6(code/type) matching, IP-in-IP and MPLS flow matching are all + mutually exclusive features which cannot be supported together + (see :ref:`mlx5_firmware_config`). + +- LRO: + + - Requires DevX and DV flow to be enabled. + - KEEP_CRC offload cannot be supported with LRO. + - The first mbuf length, without head-room, must be big enough to include the + TCP header (122B). + - Rx queue with LRO offload enabled, receiving a non-LRO packet, can forward + it with size limited to max LRO size, not to max RX packet length. + - LRO can be used with outer header of TCP packets of the standard format: + eth (with or without vlan) / ipv4 or ipv6 / tcp / payload + + Other TCP packets (e.g. with MPLS label) received on Rx queue with LRO enabled, will be received with bad checksum. + - LRO packet aggregation is performed by HW only for packet size larger than + ``lro_min_mss_size``. This value is reported on device start, when debug + mode is enabled. + +- CRC: + + - ``DEV_RX_OFFLOAD_KEEP_CRC`` cannot be supported with decapsulation + for some NICs (such as ConnectX-6 Dx, ConnectX-6 Lx, and BlueField-2). + The capability bit ``scatter_fcs_w_decap_disable`` shows NIC support. + +- TX mbuf fast free: + + - fast free offload assumes the all mbufs being sent are originated from the + same memory pool and there is no any extra references to the mbufs (the + reference counter for each mbuf is equal 1 on tx_burst call). The latter + means there should be no any externally attached buffers in mbufs. It is + an application responsibility to provide the correct mbufs if the fast + free offload is engaged. The mlx5 PMD implicitly produces the mbufs with + externally attached buffers if MPRQ option is enabled, hence, the fast + free offload is neither supported nor advertised if there is MPRQ enabled. + +- Sample flow: + + - Supports ``RTE_FLOW_ACTION_TYPE_SAMPLE`` action only within NIC Rx and + E-Switch steering domain. + - For E-Switch Sampling flow with sample ratio > 1, additional actions are not + supported in the sample actions list. + - For ConnectX-5, the ``RTE_FLOW_ACTION_TYPE_SAMPLE`` is typically used as + first action in the E-Switch egress flow if with header modify or + encapsulation actions. + - For NIC Rx flow, supports ``MARK``, ``COUNT``, ``QUEUE``, ``RSS`` in the + sample actions list. + - For E-Switch mirroring flow, supports ``RAW ENCAP``, ``Port ID``, + ``VXLAN ENCAP``, ``NVGRE ENCAP`` in the sample actions list. + +- Modify Field flow: + + - Supports the 'set' operation only for ``RTE_FLOW_ACTION_TYPE_MODIFY_FIELD`` action. + - Modification of an arbitrary place in a packet via the special ``RTE_FLOW_FIELD_START`` Field ID is not supported. + - Modification of the 802.1Q Tag, VXLAN Network or GENEVE Network ID's is not supported. + - Encapsulation levels are not supported, can modify outermost header fields only. + - Offsets must be 32-bits aligned, cannot skip past the boundary of a field. + +- IPv6 header item 'proto' field, indicating the next header protocol, should + not be set as extension header. + In case the next header is an extension header, it should not be specified in + IPv6 header item 'proto' field. + The last extension header item 'next header' field can specify the following + header protocol type. + +- Hairpin: + + - Hairpin between two ports could only manual binding and explicit Tx flow mode. For single port hairpin, all the combinations of auto/manual binding and explicit/implicit Tx flow mode could be supported. + - Hairpin in switchdev SR-IOV mode is not supported till now. + - Meter: - All the meter colors with drop action will be counted only by the global drop statistics. @@ -1438,13 +1732,17 @@ the DPDK application. echo switchdev > /sys/class/net//compat/devlink/mode -Sub-Function representor ------------------------- +SubFunction support +------------------- +SubFunction is a portion of the PCI device, a SF netdev has its own +dedicated queues(txq, rxq). A SF shares PCI level resources with other SFs +and/or with its parent PCI function. -Sub-Function is a portion of the PCI device, a SF netdev has its own -dedicated queues(txq, rxq). A SF netdev supports E-Switch representation -offload similar to existing PF and VF representors. A SF shares PCI -level resources with other SFs and/or with its parent PCI function. +0. Requirement:: + + kernel version >= 5.12 or OFED version >= 5.6 + + iproute2 >= 5.11 1. Configure SF feature:: @@ -1457,21 +1755,34 @@ level resources with other SFs and/or with its parent PCI function. 2: 32 SFs 3: 64 SFs -2. Reset the FW:: +2. Enable switchdev mode:: - mlxfwreset -d reset + devlink dev eswitch set pci/ mode switchdev -3. Enable switchdev mode:: +3. Add SF port:: - echo switchdev > /sys/class/net//compat/devlink/mode + devlink port add pci/ flavour pcisf pfnum 0 sfnum + + Get SFID from output: pci// + +4. Modify MAC address:: + + devlink port function set pci// hw_addr + +5. Activate SF port:: + + devlink port function set pci// state active -4. Create SF:: +6. Devargs to probe SF device:: - mlnx-sf -d -a create + auxiliary:mlx5_core.sf.9,dv_flow_en=1 -5. Probe SF representor:: +SubFunction representor support +------------------------------- +A SF netdev supports E-Switch representation offload similar to existing PF +and VF representors. Use to probe SF representor. - testpmd> port attach ,representor=sf0,dv_flow_en=1 + testpmd> port attach ,representor=sf,dv_flow_en=1 Performance tuning ------------------ diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c index 6fdb310129..8678502595 100644 --- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c +++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c @@ -128,6 +128,17 @@ struct ethtool_link_settings { #define ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT 2 /* 66 - 64 */ #endif +/* Get interface index from SubFunction device name. */ +int +mlx5_auxiliary_get_ifindex(const char *sf_name) +{ + char if_name[IF_NAMESIZE]; + + if (mlx5_auxiliary_get_child_name(sf_name, "/net", + if_name, sizeof(if_name)) != 0) + return -rte_errno; + return if_nametoindex(if_name); +} /** * Get interface name from private structure. @@ -1619,4 +1630,3 @@ mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN]) memcpy(mac, request.ifr_hwaddr.sa_data, RTE_ETHER_ADDR_LEN); return 0; } - diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c index 4f16230fa5..d74273a7ca 100644 --- a/drivers/net/mlx5/linux/mlx5_os.c +++ b/drivers/net/mlx5/linux/mlx5_os.c @@ -20,6 +20,7 @@ #include #include #include +#include #include #include #include @@ -1923,6 +1924,27 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev, return pf; } +static void +mlx5_os_config_default(struct mlx5_dev_config *config) +{ + memset(config, 0, sizeof(*config)); + config->mps = MLX5_ARG_UNSET; + config->dbnc = MLX5_ARG_UNSET; + config->rx_vec_en = 1; + config->txq_inline_max = MLX5_ARG_UNSET; + config->txq_inline_min = MLX5_ARG_UNSET; + config->txq_inline_mpw = MLX5_ARG_UNSET; + config->txqs_inline = MLX5_ARG_UNSET; + config->vf_nl_en = 1; + config->mr_ext_memseg_en = 1; + config->mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN; + config->mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS; + config->dv_esw_en = 1; + config->dv_flow_en = 1; + config->decap_en = 1; + config->log_hp_size = MLX5_ARG_UNSET; +} + /** * Register a PCI device within bonding. * @@ -2334,23 +2356,8 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev, uint32_t restore; /* Default configuration. */ - memset(&dev_config, 0, sizeof(struct mlx5_dev_config)); + mlx5_os_config_default(&dev_config); dev_config.vf = dev_config_vf; - dev_config.mps = MLX5_ARG_UNSET; - dev_config.dbnc = MLX5_ARG_UNSET; - dev_config.rx_vec_en = 1; - dev_config.txq_inline_max = MLX5_ARG_UNSET; - dev_config.txq_inline_min = MLX5_ARG_UNSET; - dev_config.txq_inline_mpw = MLX5_ARG_UNSET; - dev_config.txqs_inline = MLX5_ARG_UNSET; - dev_config.vf_nl_en = 1; - dev_config.mr_ext_memseg_en = 1; - dev_config.mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN; - dev_config.mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS; - dev_config.dv_esw_en = 1; - dev_config.dv_flow_en = 1; - dev_config.decap_en = 1; - dev_config.log_hp_size = MLX5_ARG_UNSET; list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device, &list[i], &dev_config, @@ -2407,6 +2414,35 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev, return ret; } +static int +mlx5_os_parse_eth_devargs(struct rte_device *dev, + struct rte_eth_devargs *eth_da) +{ + int ret = 0; + + if (dev->devargs == NULL) + return 0; + memset(eth_da, 0, sizeof(*eth_da)); + /* Parse representor information first from class argument. */ + if (dev->devargs->cls_str) + ret = rte_eth_devargs_parse(dev->devargs->cls_str, eth_da); + if (ret != 0) { + DRV_LOG(ERR, "failed to parse device arguments: %s", + dev->devargs->cls_str); + return -rte_errno; + } + if (eth_da->type == RTE_ETH_REPRESENTOR_NONE) { + /* Parse legacy device argument */ + ret = rte_eth_devargs_parse(dev->devargs->args, eth_da); + if (ret) { + DRV_LOG(ERR, "failed to parse device arguments: %s", + dev->devargs->args); + return -rte_errno; + } + } + return 0; +} + /** * Callback to register a PCI device. * @@ -2421,31 +2457,13 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev, static int mlx5_os_pci_probe(struct rte_pci_device *pci_dev) { - struct rte_eth_devargs eth_da = { .type = RTE_ETH_REPRESENTOR_NONE }; + struct rte_eth_devargs eth_da = { .nb_ports = 0 }; int ret = 0; uint16_t p; - if (pci_dev->device.devargs) { - /* Parse representor information from device argument. */ - if (pci_dev->device.devargs->cls_str) - ret = rte_eth_devargs_parse - (pci_dev->device.devargs->cls_str, ð_da); - if (ret) { - DRV_LOG(ERR, "failed to parse device arguments: %s", - pci_dev->device.devargs->cls_str); - return -rte_errno; - } - if (eth_da.type == RTE_ETH_REPRESENTOR_NONE) { - /* Support legacy device argument */ - ret = rte_eth_devargs_parse - (pci_dev->device.devargs->args, ð_da); - if (ret) { - DRV_LOG(ERR, "failed to parse device arguments: %s", - pci_dev->device.devargs->args); - return -rte_errno; - } - } - } + ret = mlx5_os_parse_eth_devargs(&pci_dev->device, ð_da); + if (ret != 0) + return ret; if (eth_da.nb_ports > 0) { /* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */ @@ -2458,10 +2476,53 @@ mlx5_os_pci_probe(struct rte_pci_device *pci_dev) return ret; } +/* Probe a single SF device on auxiliary bus, no representor support. */ +static int +mlx5_os_auxiliary_probe(struct rte_device *dev) +{ + struct rte_eth_devargs eth_da = { .nb_ports = 0 }; + struct mlx5_dev_config config; + struct mlx5_dev_spawn_data spawn = { .pf_bond = -1 }; + struct rte_auxiliary_device *adev = RTE_DEV_TO_AUXILIARY(dev); + struct rte_eth_dev *eth_dev; + int ret = 0; + + /* Parse ethdev devargs. */ + ret = mlx5_os_parse_eth_devargs(dev, ð_da); + if (ret != 0) + return ret; + /* Set default config data. */ + mlx5_os_config_default(&config); + config.sf = 1; + /* Init spawn data. */ + spawn.max_port = 1; + spawn.phys_port = 1; + spawn.phys_dev = mlx5_get_ibv_device(dev); + ret = mlx5_auxiliary_get_ifindex(dev->name); + if (ret < 0) { + DRV_LOG(ERR, "failed to get ethdev ifindex: %s", dev->name); + return ret; + } + spawn.ifindex = ret; + /* Spawn device. */ + eth_dev = mlx5_dev_spawn(dev, &spawn, &config, ð_da); + if (eth_dev == NULL) + return -rte_errno; + /* Post create. */ + eth_dev->intr_handle = &adev->intr_handle; + if (rte_eal_process_type() == RTE_PROC_PRIMARY) { + eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC; + eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_RMV; + eth_dev->data->numa_node = dev->numa_node; + } + rte_eth_dev_probing_finish(eth_dev); + return 0; +} + /** * Common bus driver callback to probe a device. * - * This function probe PCI bus device(s). + * This function probe PCI bus device(s) or a single SF on auxiliary bus. * * @param[in] dev * Pointer to the generic device. @@ -2484,7 +2545,8 @@ mlx5_os_net_probe(struct rte_device *dev) } if (mlx5_dev_is_pci(dev)) return mlx5_os_pci_probe(RTE_DEV_TO_PCI(dev)); - return 0; + else + return mlx5_os_auxiliary_probe(dev); } static int diff --git a/drivers/net/mlx5/linux/mlx5_os.h b/drivers/net/mlx5/linux/mlx5_os.h index af7cbeb418..2991d37df2 100644 --- a/drivers/net/mlx5/linux/mlx5_os.h +++ b/drivers/net/mlx5/linux/mlx5_os.h @@ -19,4 +19,6 @@ enum { #define MLX5_NAMESIZE IF_NAMESIZE +int mlx5_auxiliary_get_ifindex(const char *sf_name); + #endif /* RTE_PMD_MLX5_OS_H_ */ diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index 3defdb2db3..69edd55b86 100644 --- a/drivers/net/mlx5/mlx5.c +++ b/drivers/net/mlx5/mlx5.c @@ -2319,10 +2319,12 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_eth_dev *odev) if (opriv->sh == priv->sh || odev->device == dev->device) break; - } else if (dev->device != NULL && dev->device->driver && - dev->device->driver->name && - !strcmp(dev->device->driver->name, - MLX5_PCI_DRIVER_NAME)) { + } else if (dev->device != NULL && dev->device->driver != NULL && + dev->device->driver->name != NULL && + (strcmp(dev->device->driver->name, + MLX5_PCI_DRIVER_NAME) == 0 || + strcmp(dev->device->driver->name, + MLX5_AUXILIARY_DRIVER_NAME) == 0)) { /* odev not specified, found all mlx5 devices. */ break; } diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index 27bb34e827..b06f45fc54 100644 --- a/drivers/net/mlx5/mlx5.h +++ b/drivers/net/mlx5/mlx5.h @@ -220,6 +220,7 @@ struct mlx5_dev_config { unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */ unsigned int hw_padding:1; /* End alignment padding is supported. */ unsigned int vf:1; /* This is a VF. */ + unsigned int sf:1; /* This is a SF. */ unsigned int tunnel_en:1; /* Whether tunnel stateless offloads are supported. */ unsigned int mpls_en:1; /* MPLS over GRE/UDP is enabled. */ diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c index 25fb47c9ed..7f19b235c2 100644 --- a/drivers/net/mlx5/mlx5_rxmode.c +++ b/drivers/net/mlx5/mlx5_rxmode.c @@ -36,7 +36,7 @@ mlx5_promiscuous_enable(struct rte_eth_dev *dev) dev->data->port_id); return 0; } - if (priv->config.vf) { + if (priv->config.vf || priv->config.sf) { ret = mlx5_os_set_promisc(dev, 1); if (ret) return ret; @@ -69,7 +69,7 @@ mlx5_promiscuous_disable(struct rte_eth_dev *dev) int ret; dev->data->promiscuous = 0; - if (priv->config.vf) { + if (priv->config.vf || priv->config.sf) { ret = mlx5_os_set_promisc(dev, 0); if (ret) return ret; @@ -109,7 +109,7 @@ mlx5_allmulticast_enable(struct rte_eth_dev *dev) dev->data->port_id); return 0; } - if (priv->config.vf) { + if (priv->config.vf || priv->config.sf) { ret = mlx5_os_set_allmulti(dev, 1); if (ret) goto error; @@ -142,7 +142,7 @@ mlx5_allmulticast_disable(struct rte_eth_dev *dev) int ret; dev->data->all_multicast = 0; - if (priv->config.vf) { + if (priv->config.vf || priv->config.sf) { ret = mlx5_os_set_allmulti(dev, 0); if (ret) goto error; diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c index 6c8a64ce03..e4e057a6f8 100644 --- a/drivers/net/mlx5/mlx5_trigger.c +++ b/drivers/net/mlx5/mlx5_trigger.c @@ -1259,7 +1259,7 @@ mlx5_traffic_enable(struct rte_eth_dev *dev) } mlx5_txq_release(dev, i); } - if (priv->config.dv_esw_en && !priv->config.vf) { + if (priv->config.dv_esw_en && !priv->config.vf && !priv->config.sf) { if (mlx5_flow_create_esw_table_zero_flow(dev)) priv->fdb_def_rule = 1; else -- 2.25.1