From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 01D15A04B5; Fri, 11 Sep 2020 06:51:53 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 94D311B13C; Fri, 11 Sep 2020 06:51:53 +0200 (CEST) Received: from nat-hk.nvidia.com (nat-hk.nvidia.com [203.18.50.4]) by dpdk.org (Postfix) with ESMTP id 8486A8F96 for ; Fri, 11 Sep 2020 06:51:51 +0200 (CEST) Received: from hkpgpgate101.nvidia.com (Not Verified[10.18.92.100]) by nat-hk.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Fri, 11 Sep 2020 12:51:50 +0800 Received: from HKMAIL104.nvidia.com ([10.18.16.13]) by hkpgpgate101.nvidia.com (PGP Universal service); Thu, 10 Sep 2020 21:51:50 -0700 X-PGP-Universal: processed; by hkpgpgate101.nvidia.com on Thu, 10 Sep 2020 21:51:50 -0700 Received: from HKMAIL103.nvidia.com (10.18.16.12) by HKMAIL104.nvidia.com (10.18.16.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Fri, 11 Sep 2020 04:51:48 +0000 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (104.47.57.174) by HKMAIL103.nvidia.com (10.18.16.12) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Fri, 11 Sep 2020 04:51:48 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=JvbZ5LWKPOTnw5O73TJ2fMDzfCkc+VR3O+gj/rZnlS7G7CZ2v7YFx2uhHF/SICzQOtzUtJJOFlqEtAmD12jFGwaMnF7yXbIKePSQ+oGMO2mX/W7fMeW8IZ4bop2dDnPKLv2YMEhkeWsvQudBSPq/3IaJLN9KECEf8Tx0BjCZX5+2jAKgbvPb9IfwXCfeWhh3jlq6qUc5E2OfPTQq3iPF28O4gxiMQUbU8NW/fMbVSLanxrzMM9k2xTrzUAt1B5IoIX+xZAGUHTidKXJqxpYHeaf203d3rpfLhjWgkT4zIB0k2yduSMnEZ/rP2dZA5UWEW4f3CIN2aadfu4ZKHlQdCQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=lfdTvCGQjcikd3NWw8O3B5ky2Bih+o8YrTDtH0lPL6M=; b=WBSCO8MouNQTHfwYWMVxfTrngnKXDu7he5heAyLDVV59QgR7N/haMsul1dqqtjlmN5CvqI0FByoFFIRD4hl47TclP1lhOAUs+FdMlWzTSOyk3do6RrKNQ/coInyB2FR8QWXkOgaX5SglaNMaMjh3cm7Vwr4QTVX1s5FjXhODGkF4RcClbO/flb7J4ZB2+AimtN/PGJI0PItJjodyKZIiIqkQ3ZACu6RJ+PqpODJaRoHLYn9DXyqJ3s60hmBadHQPBzLeDaji0LyWiHkl6nAhQTeZazDdRm8AGL/goLHKqEuToIaqBNOuqmsOhLgRwilBSbi9s8Xh6zNHe3J0RUWD3g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none Received: from CY4PR1201MB0072.namprd12.prod.outlook.com (2603:10b6:910:1b::19) by CY4PR12MB1735.namprd12.prod.outlook.com (2603:10b6:903:122::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3348.16; Fri, 11 Sep 2020 04:51:45 +0000 Received: from CY4PR1201MB0072.namprd12.prod.outlook.com ([fe80::f1b2:c80a:e623:e613]) by CY4PR1201MB0072.namprd12.prod.outlook.com ([fe80::f1b2:c80a:e623:e613%11]) with mapi id 15.20.3370.016; Fri, 11 Sep 2020 04:51:44 +0000 From: Bing Zhao To: Ori Kam , NBU-Contact-Thomas Monjalon , "ferruh.yigit@intel.com" , "arybchenko@solarflare.com" , "dev@dpdk.org" Thread-Topic: [RFC] introduce support for hairpin between two ports Thread-Index: AdaH9yG7qM2XTG5ATGuODcEOhXdz1Q== Date: Fri, 11 Sep 2020 04:51:44 +0000 Message-ID: Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: nvidia.com; dkim=none (message not signed) header.d=none;nvidia.com; dmarc=none action=none header.from=nvidia.com; x-originating-ip: [103.72.122.130] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: e0cb432d-0f03-4f1a-d7ea-08d8560e6218 x-ms-traffictypediagnostic: CY4PR12MB1735: x-ld-processed: 43083d15-7273-40c1-b7db-39efd9ccc17a,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: +KfG41pUMKNLTYyQW0VVo+2sWPObDLjlR3AXShWXoejkSw0GPzD7QVh7159xWoCcS+LWGvghlRCV36b+anU3b3wWDd3c8FAHss9SaIvyUTkh8+iAI6S8/t2USPOWmdixkwiknOdfjYYFYMF1Vn8kYbEHqf0d4U2gRhQo6lUxd4FQr/aq+Cl4TsKanKY0DM2PzHv4LzZE0NnSQ0LtVnDSOtJSyor8zcoz6HS+kb8eqBqvaEb6eTZjy9tyiSJHzBwyh0yspay77vJMc5z6aabq0Zl7uP8tyqxu+JKbMdWRagOlkmVrDK8gNWJ0ndNZdW+DwTwxWgrMDu31OyzS41sngQ== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:CY4PR1201MB0072.namprd12.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(396003)(366004)(136003)(39860400002)(376002)(346002)(316002)(66946007)(66476007)(64756008)(86362001)(2906002)(5660300002)(66446008)(52536014)(8936002)(71200400001)(66556008)(76116006)(110136005)(33656002)(55016002)(478600001)(186003)(83380400001)(6506007)(7696005)(9686003)(8676002)(26005); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata: ms8mXi+WlpJGGAQpLmiaYjJ1UuyAJWrUTtkFhxWtOoBxsj5T63n0eCiISipZMFZQ94t6b1XMXoM1aBZsUPI+wZk3vhZVC28uSXk4CbonWAgbhDoShwY84Foo6Dm5YZJsvq847wHkhP0d2TMqyijmNuCVPOJXku887EAvfN4CrgsazA14pne0OP182rZ4t2vfSEN09b/zst6ymM6e69CE1r2+p+q2yoPLZsXBq9RQEX+YqSYNiWFMFLcDfqcCPQ3sHu2J+DJ+vcPrgjozQO/sAF8KyUjkk3NaHY8p2e85QCCdwSjgQfdY1Nq4WOSvz8Qm8N1fH91BnqThD4lGnoZBV/gq4kTovOI4pUVXnQLwk64CxGauKWgu3F9ETY88OVFDiFnwOcENkGcs37Erqm0EN62IjFyKSsEKclMo6S7WNMOq4XbT5PFGECD9aS8RditzlcevZ/5b0Fw/DAE45JoUTJ/FIfmitcq48FzeXVb92xE6TzSaxbr3NRp+MiuyhY3C1PUGi4LYc4MZIb2PkCF+xmAoWOVhidJg8bK0/+rBQqxsRAQAXjloB5F04L9tyOJ6lCbWPfrLD32IzZqsMAbVMyR5iBWt4fWomgt2Q/Woq5qv9YdtF5YV8fz1YMJJZvz/6h73xMucnmF4vOol15Rmlw== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: CY4PR1201MB0072.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: e0cb432d-0f03-4f1a-d7ea-08d8560e6218 X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Sep 2020 04:51:44.6662 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 29vVPdkPhBFSHrh9LTnhVJRkd0v6exbbZZVSsuT5Sn7MJhnUiZ4azhMQcxVom9RxhMiTbH3qfiUyf0IUtghutw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR12MB1735 X-OriginatorOrg: Nvidia.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1599799910; bh=lfdTvCGQjcikd3NWw8O3B5ky2Bih+o8YrTDtH0lPL6M=; h=X-PGP-Universal:ARC-Seal:ARC-Message-Signature: ARC-Authentication-Results:From:To:Subject:Thread-Topic: Thread-Index:Date:Message-ID:Accept-Language:Content-Language: X-MS-Has-Attach:X-MS-TNEF-Correlator:authentication-results: x-originating-ip:x-ms-publictraffictype: x-ms-office365-filtering-correlation-id:x-ms-traffictypediagnostic: x-ld-processed:x-ms-exchange-transport-forked: x-microsoft-antispam-prvs:x-ms-oob-tlc-oobclassifiers: x-ms-exchange-senderadcheck:x-microsoft-antispam: x-microsoft-antispam-message-info:x-forefront-antispam-report: x-ms-exchange-antispam-messagedata:Content-Type: Content-Transfer-Encoding:MIME-Version: X-MS-Exchange-CrossTenant-AuthAs: X-MS-Exchange-CrossTenant-AuthSource: X-MS-Exchange-CrossTenant-Network-Message-Id: X-MS-Exchange-CrossTenant-originalarrivaltime: X-MS-Exchange-CrossTenant-fromentityheader: X-MS-Exchange-CrossTenant-id:X-MS-Exchange-CrossTenant-mailboxtype: X-MS-Exchange-CrossTenant-userprincipalname: X-MS-Exchange-Transport-CrossTenantHeadersStamped:X-OriginatorOrg; b=ereMVlY6F3uxYSPzDSB6hljiCVCes4r+NF+Q30jcc9O/nUy020sIKCmysxcrmnU98 5QEW5j8sy5r0yGW79XtcJaMQFNKzFj2kycUtCZdcROZadBy2JQsJbpetGxjqM3f1Vw kvWhZTQ4Imu9Pj0/hu1maP8VpfMxAE/nXeFovG6xXr3EGlxEb23RwY3CbkMmDO0Kr/ EgxG7uNB+uOe0UbiA5Fdr0UlbeF7J89dQfrBhTJ6BU4xvJWu4I1EGHPliHwUE9FBHy Q8zo2puPeLhgX/zyDkAn4Oad9tL25tMVK1AOw0aglOE9/KwI8NocYTmiOV5/ge3vK2 k2A6w8X+luoPQ== Subject: [dpdk-dev] [RFC] introduce support for hairpin between two ports X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hairpin functionality only supports one single port mode (e.g testpmd application) in the current implementation. It means that the traffic will be sent out from the same port it comes. There is no such restriction for some NICs, and strong demand to support two ports hairpin mode in real-life cases. Two ports hairpin mode does not really mean hairpin will only support two ports in a single application. Indeed, it also needs to support the single port hairpin today for compatibility. In the meanwhile, 'two ports' means the ingress and egress ports of the traffic could Be different. And also, there is no restriction that 1. traffic from the same ingress port must go to the same egress port 2. traffic from the port that as 'egress' for other traffic flows must go to their 'ingress' port The configuration should be flexible and the behavior of traffic will be decided by the rte flows. Usually, during the startup phase, all the hairpin configurations except flows should be done. It means that hairpin TXQ and peer RXQ should be bound together. It is feasible in single port mode and transparent to the application. In two ports mode, there may be some problems for the queues configuring and binding. 1. Once TXQ & RXQ belong to different ports, it would be hard to configure the first port when the initialization of the second port is not done. Also, it is not proper to configure the first port during the second one starting. 2. The port could be attached and detached dynamically. Hairpin between these ports should support dynamic configuration. In two ports hairpin mode, since the TXQ and RXQ belong to different ports. If some actions need to be done in the TX part, the egress flow could be inserted explicitly and managed separately from the RX part. What's more, one egress flow could be shared for different ingress flows from the same or different ports. In order to satisfy these, some changes on the current rte ethdev and flow APIs are needed and some new APIs will be introduced. 1. Data structures in 'rte_ethdev.h' Two new members are added. struct rte_eth_hairpin_conf { uint16_t peer_count; /**< The number of peers. */ struct rte_eth_hairpin_peer peers[RTE_ETH_MAX_HAIRPIN_PEERS]; uint16_t tx_explicit; uint16_t manual_bind; }; 'tx_explicit': If 0, PMD will help to insert the egress flow in a implicit way. If 1, the application will insert it by itself. 'manual_bind': If 0, PMD will try to bind hairpin TXQ and RXQ peer automatically, like in today's single port hairpin mode and this is for backward compatibility. If 1, then manual bind API will be called. The application should ensure there is no conflict for the hairpin peer configurations between TX & RX as today and PMD could check them inside. For new member 'tx_explicit', all queue pairs from one ingress port to the same egress are suggested to have the same value in order not to create chaos, like in RSS cases. For new member 'manual_bind', the same suggestion is applicable. The support for the new members will be decided by the NICs' capacity and real-life usage from the application. 2. New macros in 'rte_ethdev.h' RTE_ETH_HAIRPIN_BIND_AUTO (0) RTE_ETH_HAIRPIN_BIND_MANUAL (1) RTE_ETH_HAIRPIN_TXRULE_IMPLICIT (0) RTE_ETH_HAIRPIN_TXRULE_EXPLICIT (1) These are used for the new members in 'struct rte_eth_hairpin_conf'. 3. New function APIs in 'rte_ethdev.h' * int rte_eth_hairpin_bind(uint16_t tx_port, uint16_t rx_port) * typedef int (*eth_hairpin_bind)(struct rte_eth_dev *dev, uint16_t rx_port); This function will be used to bind one port egress to the peer port ingress. If 'rx_port' is equal to RTE_MAX_ETHPORTS, then all the ports will be traversed to bind hairpin egress queues to all of their ingress queues configured. The application needs to call it repeatedly to bind all egress ports. This should be called after the hairpin queues are set up and devices are started. If 'manual_bind' is not specified, no need to call this API. A function pointer with 'eth_hairpin_bind' type should be provided by the PMD to execute the hardware setting in the driver. 0 return value means success and a negative value will be returned to indicate the actual failure. * int rte_eth_hairpin_unbind(uint16_t tx_port, uint16_t rx_port) * typedef int (*eth_hairpin_unbind)(struct rte_eth_dev *dev, uint16_t rx_port); This function will unbind one port egress to the peer port ingress, only one direction hairpin will be unbound. Unbinding of the opposite direction needs another call of this API. If 'rx_port' is equal to RTE_MAX_ETHPORTS, all the ports will be traversed to do the queues unbind (if any). The application needs to call it repeatedly to unbind all egress ports. The API could be called without stopping or closing the eth device, but the application should ensure the flows inserted for the hairpin port pairs be handled properly. The traffic behavior should be divinable after unbound. It is suggested to remove all the flows for the same direction of a port pairs to be unbound, on both ports. A function pointer with 'eth_hairpin_unbind' type should be provided by the PMD to execute the hardware setting in the driver. 0 return value means success and a negative value will be returned to indicate the actual failure. After unbinding, the bind API could be called again to enable it. No peer reconfiguring is supported now without closing the devices. 4. New rte_flow item * RTE_FLOW_ITEM_TYPE_TX_QUEUE struct rte_flow_item_tx_queue { uint32_t queue; }; This provides a new item to match for an egress packet. In two ports hairpin mode, since the TX rules could be inserted explicitly on the egress port, it is hard to distinguish the hairpin packets from the software packets. Even if with metadata, it may require complex management. The support new rte_flow item is optional, depending on the NIC's capacity. With this item, a few wildcard rules could be inserted for hairpin to support some common actions. When switching to two ports hairpin mode with explicit TX rules, the metadata could be used to provide the 'connection' for a packet between ingress & egress. 1. The packet header might be changed due to the NAT of DECAP in the ingress, and the inner header or other parts may be different. 2. Different ingress flow rules could share the same egress rule to simplify rules management. The rte_flow examples are like below (port 0 RX X -> port 1 TX Y): flow create 0 ingress group M pattern eth / ... / end actions queue index i= s X / set_meta data is V / end X is the ingress hairpin queue index. flow create 1 egress group N pattern eth / meta data is V / end actions vxl= an_encap / end flow create 1 egress group 0 pattern eth / tx_queue index is Y / end action= s jump group N / end Y is the egress hairpin queue index. This wildcard flow will help to redirect all the ethernet packets from hairpin TX queue Y to some specific group for further handling. In the meanwhile, other traffic sent from software will not be impacted by this wildcard rule. To verify this in testpmd, some changes are also required. 1. During startup phase, hairpin binding will use the chaining mode. E.g. if 3 ports are probed, hairpin traffic will be like this port A -> port B, Port B -> port C, port C -> port A In only a single port is probed port A -> port A 2. flow command line will add support to parse tx queue index pattern format: tx_queue index is UNSIGNED / ... Thanks Signed-off-by: Bing Zhao