From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A4D5AA0C4C; Fri, 6 Aug 2021 10:56:47 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 54A8041299; Fri, 6 Aug 2021 10:56:47 +0200 (CEST) Received: from NAM10-DM6-obe.outbound.protection.outlook.com (mail-dm6nam10on2071.outbound.protection.outlook.com [40.107.93.71]) by mails.dpdk.org (Postfix) with ESMTP id B7F494014D for ; Fri, 6 Aug 2021 10:56:45 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TUSjYKcVp1KDHW6NHxV58Lt7VyrhmvywLmwQ/uzdpStX3SvhgYeVv9ZB+OO//ye4jheFqIehE6NIoAAKn9kBsvZgZ3QbfEVUwfKbBvTcQEtEefRoQoy3edFAW93wsxI8S8BlQPZPbOPMfpBXAg1ngQrKsu62MmCI74A4omxA/USQurFJTUR61no+BtcteijtGM8pcsdXdszxiYASTxaZKjOYuFSz8mzqGYYxfQ/hvlHzykI1C1uLTxf80NlRCD7E2QflkCOm5vLTGtc57/zbhPX722QWk4B5cpIWVp4wEVto41qILgskFvpUAQSLEiNsq4Semlzu94jA9P3u8SYr1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UV2dXBWNChrFIfm3tV3zjBDYf8hvpYzk9U3KJp0ZlPk=; b=mifsRCsf3Vc1mAPcZ/lgQ+1WA5UypsvCplaaU/6ZIoCT3d5SG3N/X/RjwbEJSGXth/G0CaaaQSuJ4PVG/8pe2rdv7TMYsIm5VU4DvO3Fhaokf2wCGmENGdL65OVV0ITJECY/EPKByxF0md7d0QzfGtJWzFZ3LqzC0u22l8JPZdYkWVs2GxYiL6ewWLXFTdTSsZu0Nv60CKgh8Baveywr0UzzVgOKnPJCj1lk3o2pqBjwn8kF917mlQ/0YkooRIXaTYh3VimE8ycbnF9BEjxtlvWoAlaCFFFkgp4+pYVLGp0TBg2+GYExBgF23yPLAmcB7N/ql3InbimPhhZa/VUKbQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.112.35) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=quarantine sp=none pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UV2dXBWNChrFIfm3tV3zjBDYf8hvpYzk9U3KJp0ZlPk=; b=KDeyNyEDoaNrUCIBJwSL4q3dd1VmGbsXSWswSRPFthhSjswbpiZD8VxCwDXiYekYsd9TOp/ZAMxRJDZEFuC61IFFLM9bAU+CXFciqc0HPdCHoZ9ZsRM62yh66T3ll9iYuBYQRGdbkV15zIsBLbRwb2aXaPMaFOyq5U1q349H0ramL62fVcgOA7SryX5WkkKhLKEk4EBxZfv1cawNQlA0Mx4Ci0oJeu1DGzcLBTrWh6qL75/KnXMhVIfuc4BtOgIca1LcDcLL3NHecwBHA3wBJbekMuhOyh59Mdij6w7VdqzT1k0Rq4IpPio9q5yLMoDMr8vs0uksJzhlI/J96nvhlw== Received: from BN6PR13CA0047.namprd13.prod.outlook.com (2603:10b6:404:13e::33) by BYAPR12MB2645.namprd12.prod.outlook.com (2603:10b6:a03:61::30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.26; Fri, 6 Aug 2021 08:56:43 +0000 Received: from BN8NAM11FT050.eop-nam11.prod.protection.outlook.com (2603:10b6:404:13e:cafe::f5) by BN6PR13CA0047.outlook.office365.com (2603:10b6:404:13e::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4415.5 via Frontend Transport; Fri, 6 Aug 2021 08:56:43 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.112.35) smtp.mailfrom=nvidia.com; dpdk.org; dkim=none (message not signed) header.d=none;dpdk.org; dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.112.35 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.112.35; helo=mail.nvidia.com; Received: from mail.nvidia.com (216.228.112.35) by BN8NAM11FT050.mail.protection.outlook.com (10.13.177.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.4394.16 via Frontend Transport; Fri, 6 Aug 2021 08:56:43 +0000 Received: from DRHQMAIL107.nvidia.com (10.27.9.16) by HQMAIL111.nvidia.com (172.20.187.18) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 6 Aug 2021 08:56:42 +0000 Received: from nvidia.com (172.20.187.6) by DRHQMAIL107.nvidia.com (10.27.9.16) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Fri, 6 Aug 2021 08:56:40 +0000 From: Viacheslav Ovsiienko To: CC: , , , , , Date: Fri, 6 Aug 2021 11:56:24 +0300 Message-ID: <20210806085624.16497-1-viacheslavo@nvidia.com> X-Mailer: git-send-email 2.18.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [172.20.187.6] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To DRHQMAIL107.nvidia.com (10.27.9.16) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: f5bbf3c1-4664-4f3e-79dc-08d958b81d05 X-MS-TrafficTypeDiagnostic: BYAPR12MB2645: X-LD-Processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: HT0cWDWNG8ea5j3ANL4ixm7FLvaiWI7wzUaiMM0lBL9UQooS6IU0Lir3ztwx5jUJPWtIQGIg0zB0decJK+Jlm7dcqMSDtsTOTtXcVFFcTkZwQ3/9G8PRkDgAEqoHiOvU6NuJqFnpaZ6xSUYw2pBckwfJFmHByAdO2u0fU+/cKpVryMLyXuem8zFgDkr3kf6epOKDMeBHjj+rv+lMez/NkcILY5IZ1OYaoTnBzpyDKrvPInECsPcx7KPjsgYKTFHXh+nx5MWDQoe+1N56DllIcS5apRIu6h8iqNhafkoxmWXjTcUXAvOz7wpyL4U+sNYe9LEoBzyLbQPf1UjoubiyS69Rwcl/U9HP/cvCcFGbwFlgjqNfnynN6Gmb6fk/HCkjELH90DVQBAnR1EDfQfFiqq5amsK6cNhMvQpc0MaRmKDFL5d8psG04pcbKzklSyjv3hh3U1TNNv0cFJZTXj+fysRN1jzAJTqr8P7KwfKNtlc4zbEQPyHC3dEjO8trc8y21+MwavMd+ParNMVUWthJL552ltwn1k6BUQhBiAG8/p7zxFkBtYSbLu25IdmYUVjEBDAtzoB2AHSM2pJuilKBmncEjWm6Brf4amyIPxSGvN+cnQwBaSh9aAC6yGGjWZX8/G9yK2EpuHkZLGxw566+hQPVb2aLNhHHThiV3bYlwQur2iU7GhEm0IQvwu8/owIJilhdDUaGsJkYGV/W+T+8Dw== X-Forefront-Antispam-Report: CIP:216.228.112.35; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:schybrid02.nvidia.com; CAT:NONE; SFS:(4636009)(39860400002)(376002)(346002)(136003)(396003)(36840700001)(46966006)(8676002)(8936002)(36906005)(316002)(2616005)(426003)(1076003)(7636003)(36860700001)(2906002)(54906003)(36756003)(6666004)(30864003)(5660300002)(16526019)(82740400003)(6916009)(7696005)(4326008)(70206006)(55016002)(47076005)(82310400003)(83380400001)(26005)(6286002)(478600001)(336012)(186003)(70586007)(356005)(86362001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 06 Aug 2021 08:56:43.1589 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f5bbf3c1-4664-4f3e-79dc-08d958b81d05 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.112.35]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: BN8NAM11FT050.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR12MB2645 Subject: [dpdk-dev] [RFC] ethdev: introduce configurable flexible item X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 1. Introduction and Retrospective Nowadays the networks are evolving fast and wide, the network structures are getting more and more complicated, the new application areas are emerging. To address these challenges the new network protocols are continuously being developed, considered by technical communities, adopted by industry and, eventually implemented in hardware and software. The DPDK framework follows the common trends and if we bother to glance at the RTE Flow API header we see the multiple new items were introduced during the last years since the initial release. The new protocol adoption and implementation process is not straightforward and takes time, the new protocol passes development, consideration, adoption, and implementation phases. The industry tries to mitigate and address the forthcoming network protocols, for example, many hardware vendors are implementing flexible and configurable network protocol parsers. As DPDK developers, could we anticipate the near future in the same fashion and introduce the similar flexibility in RTE Flow API? Let's check what we already have merged in our project, and we see the nice raw item (rte_flow_item_raw). At the first glance, it looks superior and we can try to implement a flow matching on the header of some relatively new tunnel protocol, say on the GENEVE header with variable length options. And, under further consideration, we run into the raw item limitations: - only fixed size network header can be represented - the entire network header pattern of fixed format (header field offsets are fixed) must be provided - the search for patterns is not robust (the wrong matches might be triggered), and actually is not supported by existing PMDs - no explicitly specified relations with preceding and following items - no tunnel hint support As the result, implementing the support for tunnel protocols like aforementioned GENEVE with variable extra protocol option with flow raw item becomes very complicated and would require multiple flows and multiple raw items chained in the same flow (by the way, there is no support found for chained raw items in implemented drivers). This RFC introduces the dedicated flex item (rte_flow_item_flex) to handle matches with existing and new network protocol headers in a unified fashion. 2. Flex Item Lifecycle Let's assume there are the requirements to support the new network protocol with RTE Flows. What is given within protocol specification: - header format - header length, (can be variable, depending on options) - potential presence of extra options following or included in the header the header - the relations with preceding protocols. For example, the GENEVE follows UDP, eCPRI can follow either UDP or L2 header - the relations with following protocols. For example, the next layer after tunnel header can be L2 or L3 - whether the new protocol is a tunnel and the header is a splitting point between outer and inner layers The supposed way to operate with flex item: - application defines the header structures according to protocol specification - application calls rte_flow_flex_item_create() with desired configuration according to the protocol specification, it creates the flex item object over specified ethernet device and prepares PMD and underlying hardware to handle flex item. On item creation call PMD backing the specified ethernet device returns the opaque handle identifying the object have been created - application uses the rte_flow_item_flex with obtained handle in the flows, the values/masks to match with fields in the header are specified in the flex item per flow as for regular items (except that pattern buffer combines all fields) - flows with flex items match with packets in a regular fashion, the values and masks for the new protocol header match are taken from the flex items in the flows - application destroys flows with flex items - application calls rte_flow_flex_item_release() as part of ethernet device API and destroys the flex item object in PMD and releases the engaged hardware resources 3. Flex Item Structure The flex item structure is intended to be used as part of the flow pattern like regular RTE flow items and provides the mask and value to match with fields of the protocol item was configured for. struct rte_flow_item_flex { void *handle; uint32_t length; const uint8_t* pattern; }; The handle is some opaque object maintained on per device basis by underlying driver. The protocol header fields are considered as bit fields, all offsets and widths are expressed in bits. The pattern is the buffer containing the bit concatenation of all the fields presented at item configuration time, in the same order and same amount. If byte boundary alignment is needed an application can use a dummy type field, this is just some kind of gap filler. The length field specifies the pattern buffer length in bytes and is needed to allow rte_flow_copy() operations. The approach of multiple pattern pointers and lengths (per field) was considered and found clumsy - it seems to be much suitable for the application to maintain the single structure within the sinlge pattern buffer. 4. Flex Item Configuration The flex item configuration consists of the following parts: - header field descriptors: - next header - next protocol - sample to match - input link descriptors - output link descriptors The field descriptors tell driver and hardware what data should be extracted from the packet and then presented to match in the flows. Each field is a bit pattern. It has width, offset from the header beginning, mode of offset calculation, and offset related parameters. The next header field is special, no data are actually taken from the packet, but its offset is used as pointer to the next header in the packet, in other word the next header offset specifies the size of the header being parsed by flex item. There is one more special field - next protocol, it specifies where the next protocol identifier is contained and packet data sampled from this field will be used to determine the next protocol header type to continue packet pasring. The next protocol field is like eth_type field in MAC2, or proto field in IPv4/v6 headers. The sample fields are used to represent the data be sampled from the packet and then matched with established flows. There are several methods supposed to calculate field offset in runtime depending on configuration and packet content: - FIELD_MODE_FIXED - fixed offset. The bit offset from header beginning is permanent and defined by field_base configuration parameter. - FIELD_MODE_OFFSET - the field bit offset is extracted from other header field (indirect offset field). The resulting field offset to match is calculated from as: field_base + (*field_offset & offset_mask) << field_shift This mode is useful to sample some extra options following the main header with field containing main header length. Also, this mode can be used to calculate offset to the next protocol header, for example - IPv4 header contains the 4-bit field with IPv4 header length expressed in dwords. One more example - this mode would allow us to skip GENEVE header variable length options. - FIELD_MODE_BITMASK - the field bit offset is extracted from other header field (indirect offset field), the latter is considered as bitmask containing some number of one bits, the resulting field offset to match is calculated as: field_base + bitcount(*field_offset & offset_mask) << field_shift This mode would be useful to skip the GTP header and its extra options with specified flags. - FIELD_MODE_DUMMY - dummy field, optionally used for byte boundary alignment in pattern. Pattern mask and data are ignored in the match. All configuration parameters besides field size are ignored. The offset mode list can be extended by vendors according to hardware supported options. Signed-off-by: Viacheslav Ovsiienko --- lib/ethdev/rte_flow.h | 218 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 218 insertions(+) diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h index 70f455d47d..589fe513bf 100644 --- a/lib/ethdev/rte_flow.h +++ b/lib/ethdev/rte_flow.h @@ -573,6 +573,15 @@ enum rte_flow_item_type { * @see struct rte_flow_item_conntrack. */ RTE_FLOW_ITEM_TYPE_CONNTRACK, + + /** + * Matches a configured set of fields at runtime calculated offsets + * over the generic network header with variable length and + * flexible pattern + * + * See struct rte_flow_item_flex. + */ + RTE_FLOW_ITEM_TYPE_FLEX, }; /** @@ -1839,6 +1848,150 @@ struct rte_flow_item { const void *mask; /**< Bit-mask applied to spec and last. */ }; +/** + * @warning + * @b EXPERIMENTAL: this structure may change without prior notice + * + * RTE_FLOW_ITEM_TYPE_FLEX + * + * Matches a specified set of fields within the network protocol + * header. Each field is presented as set of bits with specified width, and + * bit offset (this is dynamic one - can be calulated by several methods + * in runtime) from the header beginning. + * + * The pattern is concatenation of all bit fields configured at item creation + * by rte_flow_flex_item_create() exactly in the same order and amount, no + * fields can be omitted or swapped. The dummy mode field can be used for + * pattern byte boundary alignment, least significant bit in byte goes first. + * Only the fields specified in sample_data configuration parameter participate + * in pattern construction. + * + * If pattern length is smaller than configured fields overall length it is + * extended with trailing zeroes, both for value and mask. + * + * This type does not support ranges (struct rte_flow_item.last). + */ +struct rte_flow_item_flex { + struct rte_flow_item_flex_handle *handle; /**< Opaque item handle. */ + uint32_t length; /**< Pattern length in bytes. */ + const uint8_t *pattern; /**< Combined bitfields pattern to match. */ +}; + +/** + * Field bit offset calculation mode. + */ +enum rte_flow_item_flex_field_mode { + /** + * Dummy field, used for byte boundary alignment in pattern. + * Pattern mask and data are ignored in the match. All configuration + * parameters besides field size are ignored. + */ + FIELD_MODE_DUMMY = 0, + /** + * Fixed offset field. The bit offset from header beginning is + * is permanent and defined by field_base parameter. + */ + FIELD_MODE_FIXED, + /** + * The field bit offset is extracted from other header field (indirect + * offset field). The resulting field offset to match is calculated as: + * + * field_base + (*field_offset & offset_mask) << field_shift + */ + FIELD_MODE_OFFSET, + /** + * The field bit offset is extracted from other header field (indirect + * offset field), the latter is considered as bitmask containing some + * number of one bits, the resulting field offset to match is + * calculated as: + * + * field_base + bitcount(*field_offset & offset_mask) << field_shift + */ + FIELD_MODE_BITMASK, +}; + +/** + * @warning + * @b EXPERIMENTAL: this structure may change without prior notice + */ +struct rte_flow_item_flex_field { + /** Defines how match field offset is calculated over the packet. */ + enum rte_flow_item_flex_field_mode field_mode; + uint32_t field_size; /**< Match field size in bits. */ + int32_t field_base; /**< Match field offset in bits. */ + uint32_t offset_base; /**< Indirect offset field offset in bits. */ + uint32_t offset_mask; /**< Indirect offset field bit mask. */ + int32_t offset_shift; /**< Indirect offset multiply factor. */ + uint32_t tunnel_count; /**< 0-first occurrence, 1-outer, 2-inner.*/ + uint32_t field_id; /**< device hint, for flows with multiple items. */ +}; + +/** + * @warning + * @b EXPERIMENTAL: this structure may change without prior notice + */ +struct rte_flow_item_flex_link { + /** + * Preceding/following header. The item type must be always provided. + * For preceding one item must specify the header value/mask to match + * for the link be taken and start the flex item header parsing. + */ + struct rte_flow_item item; + /** + * Next field value to match to continue with one of the configured + * next protocols. + */ + uint32_t next; + /** + * Specifies whether flex item represents tunnel protocol + */ + bool tunnel; +}; + +/** + * @warning + * @b EXPERIMENTAL: this structure may change without prior notice + */ +struct rte_flow_item_flex_conf { + /** + * The next header offset, it presents the network header size covered + * by the flex item and can be obtained with all supported offset + * calculating methods (fixed, dedicated field, bitmask, etc). + */ + struct rte_flow_item_flex_field *next_header; + /** + * Specifies the next protocol field to match with link next protocol + * values and continue packet parsing with matching link. + */ + struct rte_flow_item_flex_field *next_protocol; + /** + * The fields will be sampled and presented for explicit match + * with pattern in the rte_flow_flex_item. There can be multiple + * fields descriptors, the number should be specified by sample_num. + */ + struct rte_flow_item_flex_field *sample_data; + /** Number of field descriptors in the sample_data array. */ + uint32_t sample_num; + /** + * Input link defines the flex item relation with preceding + * header. It specified the preceding item type and provides pattern + * to match. The flex item will continue parsing and will provide the + * data to flow match in case if there is the match with one of input + * links. + */ + struct rte_flow_item_flex_link *input_link; + /** Number of link descriptors in the input link array. */ + uint32_t input_num; + /** + * Output link defines the next protocol field value to match and + * the following protocol header to continue packet parsing. Also + * defines the tunnel-related behaviour. + */ + struct rte_flow_item_flex_link *output_link; + /** Number of link descriptors in the output link array. */ + uint32_t output_num; +}; + /** * Action types. * @@ -4288,6 +4441,71 @@ rte_flow_tunnel_item_release(uint16_t port_id, struct rte_flow_item *items, uint32_t num_of_items, struct rte_flow_error *error); + +/** + * Create the flex item with specified configuration over + * the Ethernet device. + * + * @param port_id + * Port identifier of Ethernet device. + * @param[in] conf + * Item configuration. + * @param[out] error + * Perform verbose error reporting if not NULL. PMDs initialize this + * structure in case of error only. + * + * @return + * Non-NULL opaque pointer on success, NULL otherwise and rte_errno is set. + */ +__rte_experimental +struct rte_flow_item_flex_handle * +rte_flow_flex_item_create(uint16_t port_id, + const struct rte_flow_item_flex_conf *conf, + struct rte_flow_error error); + +/** + * Release the flex item on the specified Ethernet device. + * + * @param port_id + * Port identifier of Ethernet device. + * @param[in] handle + * Handle of the item existing on the specified device. + * @param[out] error + * Perform verbose error reporting if not NULL. PMDs initialize this + * structure in case of error only. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +__rte_experimental +int +rte_flow_flex_item_release(uint16_t port_id, + const struct rte_flow_item_flex_handle *handle, + struct rte_flow_error error); + +/** + * Modify the flex item on the specified Ethernet device. + * + * @param port_id + * Port identifier of Ethernet device. + * @param[in] handle + * Handle of the item existing on the specified device. + * @param[in] conf + * Item new configuration. + * @param[out] error + * Perform verbose error reporting if not NULL. PMDs initialize this + * structure in case of error only. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +__rte_experimental +int +rte_flow_flex_item_update(uint16_t port_id, + const struct rte_flow_item_flex_handle *handle, + const struct rte_flow_item_flex_conf *conf, + struct rte_flow_error error); + #ifdef __cplusplus } #endif -- 2.18.1