From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-VE1-obe.outbound.protection.outlook.com (mail-eopbgr50089.outbound.protection.outlook.com [40.107.5.89]) by dpdk.org (Postfix) with ESMTP id 36D8D4F9A for ; Thu, 1 Nov 2018 21:32:11 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=NJSWRQDdY0smbJ/uZNkC9+Es1vb3ZCW6PBfkYYhPwr4=; b=s5Mo/c6so5NJ9vZ6q9HBhHwWZQQokKo22v6jrFZqytOjU4rhm+pG6Ya6APKR6EzBip+/MOASCnIqINBB6xqKG/8qzxq/w+YdNPmNndPZAatvk0E3L0wldu7V7MDWVgJf1S8JjMHxNlAjHJFXQWcDh5fmY/hvU17xoNClK4AL13o= Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com (52.134.72.27) by DB3PR0502MB4041.eurprd05.prod.outlook.com (52.134.66.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1273.25; Thu, 1 Nov 2018 20:32:09 +0000 Received: from DB3PR0502MB3980.eurprd05.prod.outlook.com ([fe80::f8a1:fcab:94f0:97cc]) by DB3PR0502MB3980.eurprd05.prod.outlook.com ([fe80::f8a1:fcab:94f0:97cc%4]) with mapi id 15.20.1273.030; Thu, 1 Nov 2018 20:32:09 +0000 From: Yongseok Koh To: Slava Ovsiienko CC: Shahaf Shuler , "dev@dpdk.org" Thread-Topic: [PATCH v3 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Thread-Index: AQHUcd0f9IbWS/KOC0id/OlKdpziVqU7YCiA Date: Thu, 1 Nov 2018 20:32:08 +0000 Message-ID: <20181101203200.GA6118@mtidpdk.mti.labs.mlnx> References: <1539612815-47199-1-git-send-email-viacheslavo@mellanox.com> <1541074741-41368-1-git-send-email-viacheslavo@mellanox.com> In-Reply-To: <1541074741-41368-1-git-send-email-viacheslavo@mellanox.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-clientproxiedby: BYAPR11CA0095.namprd11.prod.outlook.com (2603:10b6:a03:f4::36) To DB3PR0502MB3980.eurprd05.prod.outlook.com (2603:10a6:8:10::27) authentication-results: spf=none (sender IP is ) smtp.mailfrom=yskoh@mellanox.com; x-ms-exchange-messagesentrepresentingtype: 1 x-originating-ip: [209.116.155.178] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DB3PR0502MB4041; 6:eYo8Ja2DOjEZ2HXLIxIFL0ClVX4ZvlkoDgiqOBMSRAMB4cevTwsgM0m+3zwlBZM6SlHDtRoz9PoNY7g1vHCFEh7aOJmEaQjtcn0y3pNio8/PxTrVZLnHlacuh9yTrIPvA9/lxFsQjwOZ4nN3l1Pjm4e9ZHeh3QlekYavSdROd2DV5Rjq5Q8QOr1CxMg9c2QVF2Qc4H8SHAsJzYuP+LdYdjDkz0QCV6ly1FSFe44pxUSrqVUR00+7HAvYVKhpKxRaxDyAjiFjs+UyMz4ThUR6KuC3eiQCNpCMu7mS31PNZAW+hMaGvlUf+HYlqsiaHx7PPT2l+5/33zQeZKL2hHraluqBfjxCBabbOxDc6ckSRzMhKUMDSt1ChrYaj+rESq512rlhjFV+bKg3Efzt2z5Zx5bU+hc/hqKKVpsnYVoCYKVdQDEvczJmCcwSA8JFsVBiZftsu6AaAiHDNxVgK6aBlA==; 5:9RUuxzCqzZmo+vDIoX7uZ3iOwFooHnY/PRNUH7HMPJEi079P1M+eb7MwYjC+5r2AAbjtJ47dXY6SE2M7YFAmsnGbLVIl7T1pq5Yyvj9uwUCh7GZsVWhEp7CP2FXFdzQ4/MRJA3eU8XihY2pn94/0+e3wldtdKnbDA62erd72ocI=; 7:rUwv9kgG1SU05amsRqLoG5VYJs6ZFbzPblggwh5YCFLc2BKguiF95tPsStagBp54yrtrWHM1hrehtA4C5/6AuiIB1TaSSah6LKVhij/hEBFBme+EoDbZHQVGoA5Ku7Q2fd/WSMXOXBH3MavZoUNQQw== x-ms-office365-filtering-correlation-id: 82f9368a-954a-4de9-01ee-08d640391862 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:DB3PR0502MB4041; x-ms-traffictypediagnostic: DB3PR0502MB4041: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(93006095)(93001095)(3231382)(944501410)(4982022)(52105095)(3002001)(10201501046)(6055026)(148016)(149066)(150057)(6041310)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123562045)(20161123558120)(201708071742011)(7699051)(76991095); SRVR:DB3PR0502MB4041; BCL:0; PCL:0; RULEID:; SRVR:DB3PR0502MB4041; x-forefront-prvs: 0843C17679 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(136003)(346002)(39860400002)(376002)(366004)(396003)(199004)(189003)(7736002)(186003)(2906002)(76176011)(14444005)(52116002)(106356001)(11346002)(229853002)(256004)(99286004)(105586002)(5024004)(71200400001)(478600001)(561944003)(71190400001)(33656002)(446003)(305945005)(476003)(25786009)(81156014)(2900100001)(102836004)(81166006)(6636002)(8676002)(14454004)(6306002)(6862004)(5660300001)(66066001)(9686003)(6512007)(6246003)(54906003)(316002)(26005)(4326008)(3846002)(97736004)(1076002)(86362001)(53936002)(6116002)(68736007)(6486002)(486006)(386003)(6506007)(966005)(6436002)(33896004)(8936002)(21314003); DIR:OUT; SFP:1101; SCL:1; SRVR:DB3PR0502MB4041; H:DB3PR0502MB3980.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: 4rXYcn8Vi6VT0tD8boGnvK8c3QWtw8PsI/G4KY2mNzO2mEonge/oXxqDDfH3W9BoYdr0Pkybf7iRylbl+qTG4Jml4GqZ2CPDq+p4BxydushmYS6av0RgZx1tVmvnPzXJnPWU+RG5mzez5zMSw4+4u627b0pJq5Vx9jJ4Ln9tU1wqmi2Ckbj8KKMzNa3kYM0emnlA+xrrn2h6AuN7y50RagX1sdmwtA8h/mX9/mH4YAeqxjMPeV6yLdz5N5nXm9rQyNYyou0nHdN5VeUAqKeih91EOWlYYyClCj672GzW2TnppIKf8Db64lrcJaJK9EuZDrQpjetRjMkzL360AzrfHgfj6iZKYl7kl2xZSt/19ew= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 82f9368a-954a-4de9-01ee-08d640391862 X-MS-Exchange-CrossTenant-originalarrivaltime: 01 Nov 2018 20:32:09.0076 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB3PR0502MB4041 Subject: Re: [dpdk-dev] [PATCH v3 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Nov 2018 20:32:11 -0000 On Thu, Nov 01, 2018 at 05:19:21AM -0700, Slava Ovsiienko wrote: > This patchset adds the VXLAN encapsulation/decapsulation hardware > offload feature for E-Switch. > =20 > A typical use case of tunneling infrastructure is port representors=20 > in switchdev mode, with VXLAN traffic encapsulation performed on > traffic coming *from* a representor and decapsulation on traffic > going *to* that representor, in order to transparently assign > a given VXLAN to VF traffic. >=20 > Since these actions are supported at the E-Switch level, the "transfer"=20 > attribute must be set on such flow rules. They must also be combined > with a port redirection action to make sense. >=20 > Since only ingress is supported, encapsulation flow rules are normally > applied on a physical port and emit traffic to a port representor.=20 > The opposite order is used for decapsulation. >=20 > Like other mlx5 E-Switch flow rule actions, these ones are implemented > through Linux's TC flower API. Since the Linux interface for VXLAN > encap/decap involves virtual network devices (i.e. ip link add type > vxlan [...]), the PMD dynamically spawns them on a needed basis > through Netlink calls. These VXLAN implicitly created devices are > called VTEPs (Virtual Tunnel End Points). >=20 > VXLAN interfaces are dynamically created for each local port of > outer networks and then used as targets for TC "flower" filters > in order to perform encapsulation. For decapsulation the VXLAN > devices are created for each unique UDP-port. These VXLAN interfaces > are system-wide, the only one device with given UDP port can exist=20 > in the system (the attempt of creating another device with the=20 > same UDP local port returns EEXIST), so PMD should support the > shared (between PMD instances) device database.=20 >=20 > Rules samples consideraions: >=20 > $PF - physical device, outer network > $VF - representor for VF, outer/inner network > $VXLAN - VTEP netdev name > $PF_OUTER_IP - $PF IP (v4 or v6) within outer network > $REMOTE_IP - remote peer IP (v4 or v6) within outer network > $LOCAL_PORT - local UDP port > $REMOTE_PORT - remote UDP port >=20 > VXLAN VTEP creation with iproute2 (PMD does the same via Netlink): >=20 > - for encapsulation: >=20 > ip link add $VXLAN type vxlan dstport $LOCAL_PORT external dev $PF > ip link set dev $VXLAN up > tc qdisc del dev $VXLAN ingress > tc qdisc add dev $VXLAN ingress >=20 > $LOCAL_PORT for egress encapsulated traffic (note, this is not > source UDP port in the VXLAN header, it is just UDP port assigned > to VTEP, no practical usage) is selected from available UDP ports > automatically in range 30000-60000. >=20 > - for decapsulation: >=20 > ip link add $VXLAN type vxlan dstport $LOCAL_PORT external > ip link set dev $VXLAN up > tc qdisc del dev $VXLAN ingress > tc qdisc add dev $VXLAN ingress >=20 > $LOCAL_PORT is UDP port receiving the VXLAN traffic from outer networks. >=20 > All ingress UDP traffic with given UDP destination port from ALL existing > netdevs is routed by kernel to the $VXLAN net device. While applying the > rule the kernel checks the IP parameter withing rule, determines the > appropriate underlaying PF and tryes to setup the rule hardware offload. >=20 > VXLAN encapsulation=20 >=20 > VXLAN encap rules are applied to the VF ingress traffic and have the=20 > VTEP as actual redirection destinations instead of outer PF. > The encapsulation rule should provide: > - redirection action VF->PF > - VF port ID > - some inner network parameters (MACs)=20 > - the tunnel outer source IP (v4/v6), (IS A MUST) > - the tunnel outer destination IP (v4/v6), (IS A MUST). > - VNI - Virtual Network Identifier (IS A MUST) >=20 > VXLAN encapsulation rule sample for tc utility: >=20 > tc filter add dev $VF protocol all parent ffff: flower skip_sw \ > action tunnel_key set dst_port $REMOTE_PORT \ > src_ip $PF_OUTER_IP dst_ip $REMOTE_IP id $VNI \ > action mirred egress redirect dev $VXLAN >=20 > VXLAN encapsulation rule sample for testpmd: >=20 > - Setting up outer properties of VXLAN tunnel: >=20 > set vxlan ip-version ipv4 vni $VNI \ > udp-src $IGNORED udp-dst $REMOTE_PORT \ > ip-src $PF_OUTER_IP ip-dst $REMOTE_IP \ > eth-src $IGNORED eth-dst $REMOTE_MAC >=20 > - Creating a flow rule on port ID 4 performing VXLAN encapsulation > with the abovementioned properties and directing the resulting > traffic to port ID 0: >=20 > flow create 4 ingress transfer pattern eth src is $INNER_MAC / end > actions vxlan_encap / port_id id 0 / end >=20 > There is no direct way found to provide kernel with all required > encapsulatioh header parameters. The encapsulation VTEP is created > attached to the outer interface and assumed as default path for > egress encapsulated traffic. The outer tunnel IP address are > assigned to interface using Netlink, the implicit route is > created like this: >=20 > ip addr add peer dev scope link >=20 > The peer address option provides implicit route, and scope link > attribute reduces the risk of conflicts. At initialization time all > local scope link addresses are flushed from the outer network device. >=20 > The destination MAC address is provided via permenent neigh rule: >=20 > ip neigh add dev lladdr to nud permanent >=20 > At initialization time all neigh rules of permanent type are flushed > from the outer network device.=20 >=20 > VXLAN decapsulation=20 >=20 > VXLAN decap rules are applied to the ingress traffic of VTEP ($VXLAN) > device instead of PF. The decapsulation rule should provide: > - redirection action PF->VF > - VF port ID as redirection destination > - $VXLAN device as ingress traffic source > - the tunnel outer source IP (v4/v6), (optional) > - the tunnel outer destination IP (v4/v6), (IS A MUST) > - the tunnel local UDP port (IS A MUST, PMD looks for appropriate VTEP > with given local UDP port) > - VNI - Virtual Network Identifier (IS A MUST) >=20 > VXLAN decap rule sample for tc utility:=20 >=20 > tc filter add dev $VXLAN protocol all parent ffff: flower skip_sw \ > enc_src_ip $REMOTE_IP enc_dst_ip $PF_OUTER_IP enc_key_id $VNI \ > nc_dst_port $LOCAL_PORT \ > action tunnel_key unset action mirred egress redirect dev $VF > =09 > VXLAN decap rule sample for testpmd:=20 >=20 > - Creating a flow on port ID 0 performing VXLAN decapsulation and directi= ng > the result to port ID 4 with checking inner properties: >=20 > flow create 0 ingress transfer pattern /=20 > ipv4 src is $REMOTE_IP dst $PF_LOCAL_IP / > udp src is 9999 dst is $LOCAL_PORT / vxlan vni is $VNI /=20 > eth src is 00:11:22:33:44:55 dst is $INNER_MAC / end > actions vxlan_decap / port_id id 4 / end >=20 > The VXLAN encap/decap rules constrains (implied by current kernel support= ) >=20 > - VXLAN decapsulation provided for PF->VF direction only > - VXLAN encapsulation provided for VF->PF direction only > - current implementation will support non-shared database of VTEPs > (impossible simultaneous usage of the same UDP port by several > instances of DPDK apps) >=20 > Suggested-by: Adrien Mazarguil > Signed-off-by: Viacheslav Ovsiienko > --- Excellent commit log!! One nit. Please change e-switch in the title/log to E-Switch. Thanks, Yongseok > v3: > * patchset is resplitted into more dedicated parts > * decapsulation rule takes MAC from inner eth item > * appropriate RTE_BEx are replaced with runtime rte_cpu_xxx > * E-Switch Flow counter deletion is fixed > * VTEP management routines are refactored > * found typos are corrected >=20 > v2: > * removed non-VXLAN related parts > * multipart Netlink messages support > * local IP and peer IP rules management > * neigh IP address to MAC address rules > * management rules cleanup at outer device initialization > * attached devices cleanup at outer device initialization >=20 > v1: > * http://patches.dpdk.org/patch/45800/ > * Refactored code of initial experimental proposal >=20 > v0: > * http://patches.dpdk.org/cover/44080/ > * Initial proposal by Adrien Mazarguil >=20 > Viacheslav Ovsiienko (13): > net/mlx5: prepare makefile for adding e-switch VXLAN > net/mlx5: prepare meson.build for adding e-switch VXLAN > net/mlx5: add necessary definitions for e-switch VXLAN > net/mlx5: add necessary structures for e-switch VXLAN > net/mlx5: swap items/actions validations for e-switch rules > net/mlx5: add e-switch VXLAN support to validation routine > net/mlx5: add VXLAN support to flow prepare routine > net/mlx5: add VXLAN support to flow translate routine > net/mlx5: e-switch VXLAN netlink routines update > net/mlx5: fix e-switch Flow counter deletion > net/mlx5: add e-switch VXLAN tunnel devices management > net/mlx5: add e-switch VXLAN encapsulation rules > net/mlx5: add e-switch VXLAN rule cleanup routines >=20 > drivers/net/mlx5/Makefile | 85 + > drivers/net/mlx5/meson.build | 34 + > drivers/net/mlx5/mlx5_flow.h | 11 + > drivers/net/mlx5/mlx5_flow_tcf.c | 5118 +++++++++++++++++++++++++++++---= ------ > 4 files changed, 4107 insertions(+), 1141 deletions(-) >=20 > --=20 > 1.8.3.1 >=20