From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-AM5-obe.outbound.protection.outlook.com (mail-eopbgr30058.outbound.protection.outlook.com [40.107.3.58]) by dpdk.org (Postfix) with ESMTP id 61E8A23D for ; Mon, 15 Oct 2018 16:14:17 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=SuIaxMgZcdDZv3giGp2i81AW1llkZ2p6Jinh3UVAQcM=; b=H+RHxzmVlbu/+wW8Fh/dFVDIR1TBo/d04gf1s0fO3taVjIXd+hJG31IGnmk4XjlQKU4fjH6yLg7ONB3XRTQOFQEzL+P5ZGIZVG/AnAz0q1CAIppry5DlnQRWrMWKArPj9mrHQ6Ox7gGGjim+dn1HWnXoYlHlszkrkNScJ1iku9U= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=viacheslavo@mellanox.com; Received: from mellanox.com (37.142.13.130) by VI1PR05MB3277.eurprd05.prod.outlook.com (2603:10a6:802:1c::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1228.24; Mon, 15 Oct 2018 14:14:15 +0000 From: Viacheslav Ovsiienko To: shahafs@mellanox.com, yskoh@mellanox.com Cc: dev@dpdk.org, Viacheslav Ovsiienko Date: Mon, 15 Oct 2018 14:13:34 +0000 Message-Id: <1539612815-47199-7-git-send-email-viacheslavo@mellanox.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1539612815-47199-1-git-send-email-viacheslavo@mellanox.com> References: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> <1539612815-47199-1-git-send-email-viacheslavo@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [37.142.13.130] X-ClientProxiedBy: CWLP265CA0084.GBRP265.PROD.OUTLOOK.COM (2603:10a6:401:50::24) To VI1PR05MB3277.eurprd05.prod.outlook.com (2603:10a6:802:1c::22) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 0e3ed0c3-d0a8-4b24-b6e3-08d632a87d4e X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600074)(711020)(4618075)(2017052603328)(7153060)(7193020); SRVR:VI1PR05MB3277; X-Microsoft-Exchange-Diagnostics: 1; VI1PR05MB3277; 3:CZGSnntHvCoghI3O8NAibVvu6g8ucuxIlry9vJcTLhEM1o1oOqdQurLPttMQ8PjKEqPWLHSfLEfEeHG1fryl417lCS+ZOjM7TOSOWPgK+VJ+mRENXZUFMGCpkOz0JpgneJFNHhx7gjieats1AJrOEx8kc6YUm6ohCa8T9PmWBAjxtyaV4UK2KzRexoAsJsbfr5f0QmPXLjIfP556hmvCmCu6k2hsQaAbif0MzEIC+dSTwgXezGWCJH+FfjW6LXQ6; 25:AtcFwn+OyBb8L026rCZeKt1f7CWanWkqI0dB/cSFxivqMADgjXsMWMBP0UwbKLtsLeb/vfRysaFiALC0tesweqg/QSWaDWy8kehAqT+0GausI9b3mN8VFEg46FHak+W3EJu43TrVwiyEaz/JTGwLC4XFD7/LJEPwzIIdY0g+nUrnEoGTsyL0re+nHV7fwYUYla1tFhj2/e5f6qecblmR0nPkk0hNyPrMLK5Gqe/uBWiZrcSs2rQ0XorMJbtsLxWWgGrqk7kuoHqEvXz/WBgQwpRIL8ZnA0y08NKww4eLXcbLdsiYOKo7xxeVbT9IGJpNgjaoMlNYriWejSG7HoPOmQ==; 31:1l0lsbTDAXc93W4CCmRecau3NPD9/FkYZgWDwjj2vV1KJqu8aNd7tKfw7p11DYL1hWZrSwIcfpeVcdmiQru5BPQgtqGqEiLwUuKW4WKP3BESGzerbLaRBDng0UFwTMhOjPGRt2evr5jFi442fJBoFOZDIHwePKGEnZ/nYzE76ZNEWjPR+xTHmMol/FFv6tzxe4g6mud12FnJ41VUmUo5l1eb/Sn8rIhi0Sduhyub//A= X-MS-TrafficTypeDiagnostic: VI1PR05MB3277: X-Microsoft-Exchange-Diagnostics: 1; VI1PR05MB3277; 20:Z5ubXnVXT6HBTkUqQaLeqHRLY0qWYNcafVpRb8t6Fe1HJG/WqKtsWvZ+8qgOSxxOVZ6LTpd7lUQT4QXqsncmIKG1CzpCL5IeJMzlCXH2xjxoBVQfXwGb8gkpvgoFagrPgD9WsJ2+0KM6vMaCUyJ4nL7x/JiTqVOZtjVgh/x0Gui6I2PW+HoAHkEml+fQR68QT5NjSjgy21pa2Pw7DC63BkXE77+oJCDcTi5tH5quiySuMplrS/I/x2g5P5Gk08H2l4ktau9na/burxuIl4dFV4pwFe8jLSc/oC5GyFXs1qKcPEebOtOAfRFLJA0w7+eaC3LC1pnAC3joaq6UdvEaX9xOQMN64C5Amxs08kxWlDZZa0TZ2Ocvs8/uverra85p9rJ1bmbvoOtF+QiryXtndM6QVYDpOm9wLElIg3mOZuRQQueophiniOamnl89DWGLpijJfEG0Bc/wu43btkAQwAXx8XHb4CVf0s4df5dH9iK0GrXGiZryHJxcjcbDb+tH; 4:XIgmX/QxAjPFgX1V2etPOliTiwObJkqMpyC3AORBqB5xoXLcx/s+yGeoK/bJ6avTXWwsXbSJNPwsiWbou64mvH5GEwy9AsOYcaXBPQ5DVa73HrokZkM9I49igZ0jrihN1BDLTnLctz1Q5lGf/U6vUAh38F41l/h/6m5fFIc/Ui3rrvr9vBk6T5rB62dE9fPOJCSrBuQSzkhIKu8S6dPBCUPrxWfHjuz3GIoYckFA+V03b+8fYK4eDzbzyqcI/PBgpAWyac++iKptunbDC/jvBAervncKu7+gDvNeXZgiER5QVexRdBPKj8ouPardtanu X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(211171220733660); X-MS-Exchange-SenderADCheck: 1 X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(8121501046)(5005006)(3002001)(3231355)(944501410)(52105095)(10201501046)(93006095)(93001095)(6055026)(149066)(150057)(6041310)(20161123560045)(20161123558120)(20161123562045)(20161123564045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(201708071742011)(7699051); SRVR:VI1PR05MB3277; BCL:0; PCL:0; RULEID:; SRVR:VI1PR05MB3277; X-Forefront-PRVS: 0826B2F01B X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(136003)(39860400002)(376002)(346002)(366004)(396003)(199004)(189003)(107886003)(16526019)(66066001)(7736002)(8676002)(47776003)(106356001)(51416003)(7696005)(52116002)(25786009)(68736007)(105586002)(186003)(16586007)(305945005)(8936002)(3846002)(81166006)(81156014)(6116002)(386003)(50226002)(6666004)(50466002)(11346002)(5024004)(86362001)(14444005)(446003)(76176011)(97736004)(6636002)(21086003)(36756003)(2906002)(33026002)(4720700003)(69596002)(53936002)(55016002)(48376002)(85306007)(478600001)(476003)(8886007)(316002)(26005)(4326008)(4744004)(486006)(2616005)(5660300001)(956004); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR05MB3277; H:mellanox.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; Received-SPF: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; VI1PR05MB3277; 23:36TxnZo1ipgM8F2xtNAPP9dTt7G1WyjCryuP3YId4?= =?us-ascii?Q?OchAm2xJwgbLNC1iN2WZvU7dtFicLptydTj6Tm3DPxv2zz77yDaP1lVO5O/V?= =?us-ascii?Q?5E5Yr3ma2iibJVf4NiHs9T46ZWZ2WcyZC8xvuchAGlD0XJQctLfcDCexfxCp?= =?us-ascii?Q?fyun+1oT/9wEk2+Z3OdqtM0y3n0peHE+9wVNJ/YpVdkBtzv6PPaFL3zE/Nls?= =?us-ascii?Q?a9yJ/2ZVTev+rq4sTWRDXYiaH+IhnbSNtdO+hDb+SsTb3BqSAJyJLW2R75Hp?= =?us-ascii?Q?Ml9q+Lk4L2/bkE2j1pTsrqowJPGYgfizw/76c5sVOxqrXyChjlZ0mEVy2JNy?= =?us-ascii?Q?v1/hxGGFWAIHv8pqRNHJMPtIHeVE+cPIK8k8aGTU2A5KJ6R/VlAMdgqEUIvY?= =?us-ascii?Q?45UO5xbbariRTy1tb+ObTDcBqHT8+4q//DMMqwqcX6p9XfvHZ8tJ7QamJg4L?= =?us-ascii?Q?D9lp3omk3GnMfLbDWZlcakCfd5i/+BLi3hnDpnKjh2XJnywRI36Mt2vpGv/r?= =?us-ascii?Q?TeG/WvsYdpgfq/mt20Ms76PNgGdftn/cTrFUVo/BEAPRbgQ09mrSFG0Hk4an?= =?us-ascii?Q?kAI3/I/nNB9+/waLCT4M3+GPgo+N2XsWF96sPVfTAfwg06dTcYLR6R3nHmXT?= =?us-ascii?Q?kKlHkBfO9nC9gWqrxSWximiaoYSn3438IxvPTAJBzTICxkXPwkgq9d4445Ez?= =?us-ascii?Q?gNng31UFo4S/QdC/oxc33Kx0nZNQ1F+aDfQJyZajrLpm9REr/cUqBzZk0qIf?= =?us-ascii?Q?DDakc/LAfTujxP7TrYwkPikYz1r9Yqw+3vtitoMAljeIyY6on6HjfLhdJSaL?= =?us-ascii?Q?mYRuNNlvyO9Vvz+hOxLLxUjYkrsgZcC+cGv7oqQxVtsCswkO10ZAPPT4Hsaf?= =?us-ascii?Q?PoXkBDmTBh6YXWhPc3K6djdYHN+EGqDk1V+kx+4tCHpbBBppQxKCHedoc98I?= =?us-ascii?Q?ZtkpMTpPjhSs1G7yyItJTKKHHVrFs7Hde3iaBE2hvw96HsfaB1X+zjcXI4Fw?= =?us-ascii?Q?+b5l7Moz8ruqWM43msXKVFdGRqOjykO920bCrulGZIjlIuAlv9HtA3pBJEvv?= =?us-ascii?Q?ICmtlFlHgi5roLs9U5UDdvk3y3XZWkd0EiH0GYVY2I7t8u0Pd2HQjo8PMAgt?= =?us-ascii?Q?RIhr1mEkGLbwNPhlPS4W0ymEpWA6tvD1SmYUuw/vQhVKfqzoExDll/7KqkEh?= =?us-ascii?Q?F07CXuim+YCOC9Ae7tKi8DHez0UX95afkFW1FVTxS7/3XeVEAXEPYiJkZ7XQ?= =?us-ascii?Q?aq5VgHMejTrmeUa4emzQlVM5KGuyrfBg0TtBVEESIjlXoT+vJtGuVsN9/xj6?= =?us-ascii?Q?KyMn3HELsY0OsW4TXZ1jpI69eKf+AHirApVCwlEpEQoGG5e4VEsItIb0Ejmp?= =?us-ascii?Q?ANxug=3D=3D?= X-Microsoft-Antispam-Message-Info: Rd47uO7+KpgvzgzTy32+0qWldAvtYRvrGmSy843AYC6WUKmhbsuqzgnajIJdcpURSKqGVEtpCoL2S3gvXySodkGj9B0S7yE/6vYQBP+VsrLlbwu2YkKJXtwQpgXEYDhfIEekEHz0zJTGMgpgkvOkz4q3I6SJsePn1N1rDLca5KUqswnkL8Maa0TLeRJLaYjfoowy/A2SlRWxb2l0jiDny4eJI7l0r8zni+o2S9FkkKqsMZN+9U0FbuqfbvEWUTfPev/IGXU1kWPWKZmk6osEDAhxW1PXQY90nTgmwd2rr4HSE0bxn7EegdouaLzepvVaCtUT3W2qSwf+ZrpSoPvyemjf3Ur4uepu/oTh+4KFK5Q= X-Microsoft-Exchange-Diagnostics: 1; VI1PR05MB3277; 6:1uVh04fmTl1/YaKbKdyBhbSl+xVu7fzpjgKu+klD/jfHuxJloUvSW/VnLwSyB9NgqtSqTQhTpl8d8iI1RJ9a8StSq9Jf/dnXiBROVyfHb5no7oDRryhmAXjDwaFW5V0h+/0fHkOp7pKyCMkgFQE1zdr17Ej8FodXP083Rv7FEc+as3qQgl9PAfOYIiYY9Rds04iF6y6qzGHUvU+0Au/TF3YlFB5xWRWEK9flyX0RPbPsi+mU+9fF+UFMou22d7AsswWxnIbUFmbQGv1M1Qv5jfDThuvoPd4Ht0VqiRoF9q/0kGZXHH1gyEvuxdTkg/BHQYZebU8R7FTV3nCzeIwrVAf1MkA8Tt95ZDtXEwkDlFsxed1cc8zNWdmpU2yzTZfg557yhRMYTmlEhxCzYatHJXGVC4yz5emgn0xTHcu7gydCNrTjaNZRoXwOQblyv1N51OmcbF1nI9Bj4LuIrruwfw==; 5:IuZY/d3X8nkiG/2RzgTFJC5LF0ytg5UzwD7brMG+rx29/FRVnsYWwmaVy06mH85ojy5eFePHBG3U3i8hh63cTqSlacxbvBFNFXVeYI9A3IftEoddzKfSj2FRmii7PEC/v/5Us9Ek/4LnoaHw9+ziU0jHtOs3yFKJLfeyWKCjhLI=; 7:rpF6GnaTDBLb8KgWSdF7US5BATfUmAznYfxA+IfoPt6dDKjDJBsWQxkGFR6g5SiaZjM+PN6snuiNm+EzvtqKFvpP/ElLUVwJXq2w7AXetQ7q6kLScXkGbkYENoWKe8opDbW+Y4AfuSHMsT2uPhNz57Hz8eFDyjTJ6YlCD5v5Gp4SwfJCN04vHbpD/O5CwwiBPzpmN3UgiK6uk6lW4+PWZoJo6gMfhsXM0zModkJH7/oyO1OV5cuOX64RAPmrTSmi SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Oct 2018 14:14:15.1146 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0e3ed0c3-d0a8-4b24-b6e3-08d632a87d4e X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR05MB3277 Subject: [dpdk-dev] [PATCH v2 6/7] net/mlx5: e-switch VXLAN encapsulation rules management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Oct 2018 14:14:17 -0000 VXLAN encap rules are applied to the VF ingress traffic and have the VTEP as actual redirection destinations instead of outer PF. The encapsulation rule should provide: - redirection action VF->PF - VF port ID - some inner network parameters (MACs/IP) - the tunnel outer source IP (v4/v6) - the tunnel outer destination IP (v4/v6). Current - VNI - Virtual Network Identifier There is no direct way found to provide kernel with all required encapsulatioh header parameters. The encapsulation VTEP is created attached to the outer interface and assumed as default path for egress encapsulated traffic. The outer tunnel IP address are assigned to interface using Netlink, the implicit route is created like this: ip addr add peer dev scope link Peer address provides implicit route, and scode link reduces the risk of conflicts. At initialization time all local scope link addresses are flushed from device (see next part of patchset). The destination MAC address is provided via permenent neigh rule: ip neigh add dev lladdr to nud permanent At initialization time all neigh rules of this type are flushed from device (see the next part of patchset). Suggested-by: Adrien Mazarguil Signed-off-by: Viacheslav Ovsiienko --- drivers/net/mlx5/mlx5_flow_tcf.c | 394 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 389 insertions(+), 5 deletions(-) diff --git a/drivers/net/mlx5/mlx5_flow_tcf.c b/drivers/net/mlx5/mlx5_flow_tcf.c index efa9c3b..a1d7733 100644 --- a/drivers/net/mlx5/mlx5_flow_tcf.c +++ b/drivers/net/mlx5/mlx5_flow_tcf.c @@ -3443,6 +3443,376 @@ struct pedit_parser { return -err; } +/** + * Emit Netlink message to add/remove local address to the outer device. + * The address being added is visible within the link only (scope link). + * + * Note that an implicit route is maintained by the kernel due to the + * presence of a peer address (IFA_ADDRESS). + * + * These rules are used for encapsultion only and allow to assign + * the outer tunnel source IP address. + * + * @param[in] tcf + * Libmnl socket context object. + * @param[in] encap + * Encapsulation properties (source address and its peer). + * @param[in] ifindex + * Network interface to apply rule. + * @param[in] enable + * Toggle between add and remove. + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +static int +flow_tcf_rule_local(struct mlx5_flow_tcf_context *tcf, + const struct mlx5_flow_tcf_vxlan_encap *encap, + unsigned int ifindex, + bool enable, + struct rte_flow_error *error) +{ + struct nlmsghdr *nlh; + struct ifaddrmsg *ifa; + alignas(struct nlmsghdr) + uint8_t buf[mnl_nlmsg_size(sizeof(*ifa) + 128)]; + + nlh = mnl_nlmsg_put_header(buf); + nlh->nlmsg_type = enable ? RTM_NEWADDR : RTM_DELADDR; + nlh->nlmsg_flags = + NLM_F_REQUEST | (enable ? NLM_F_CREATE | NLM_F_REPLACE : 0); + nlh->nlmsg_seq = 0; + ifa = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifa)); + ifa->ifa_flags = IFA_F_PERMANENT; + ifa->ifa_scope = RT_SCOPE_LINK; + ifa->ifa_index = ifindex; + if (encap->mask & MLX5_FLOW_TCF_ENCAP_IPV4_SRC) { + ifa->ifa_family = AF_INET; + ifa->ifa_prefixlen = 32; + mnl_attr_put_u32(nlh, IFA_LOCAL, encap->ipv4.src); + if (encap->mask & MLX5_FLOW_TCF_ENCAP_IPV4_DST) + mnl_attr_put_u32(nlh, IFA_ADDRESS, + encap->ipv4.dst); + } else { + assert(encap->mask & MLX5_FLOW_TCF_ENCAP_IPV6_SRC); + ifa->ifa_family = AF_INET6; + ifa->ifa_prefixlen = 128; + mnl_attr_put(nlh, IFA_LOCAL, + sizeof(encap->ipv6.src), + &encap->ipv6.src); + if (encap->mask & MLX5_FLOW_TCF_ENCAP_IPV6_DST) + mnl_attr_put(nlh, IFA_ADDRESS, + sizeof(encap->ipv6.dst), + &encap->ipv6.dst); + } + if (!flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL)) + return 0; + return rte_flow_error_set + (error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "netlink: cannot complete IFA request (ip addr add)"); +} + +/** + * Emit Netlink message to add/remove neighbor. + * + * @param[in] tcf + * Libmnl socket context object. + * @param[in] encap + * Encapsulation properties (destination address). + * @param[in] ifindex + * Network interface. + * @param[in] enable + * Toggle between add and remove. + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +static int +flow_tcf_rule_neigh(struct mlx5_flow_tcf_context *tcf, + const struct mlx5_flow_tcf_vxlan_encap *encap, + unsigned int ifindex, + bool enable, + struct rte_flow_error *error) +{ + struct nlmsghdr *nlh; + struct ndmsg *ndm; + alignas(struct nlmsghdr) + uint8_t buf[mnl_nlmsg_size(sizeof(*ndm) + 128)]; + + nlh = mnl_nlmsg_put_header(buf); + nlh->nlmsg_type = enable ? RTM_NEWNEIGH : RTM_DELNEIGH; + nlh->nlmsg_flags = + NLM_F_REQUEST | (enable ? NLM_F_CREATE | NLM_F_REPLACE : 0); + nlh->nlmsg_seq = 0; + ndm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ndm)); + ndm->ndm_ifindex = ifindex; + ndm->ndm_state = NUD_PERMANENT; + ndm->ndm_flags = 0; + ndm->ndm_type = 0; + if (encap->mask & MLX5_FLOW_TCF_ENCAP_IPV4_DST) { + ndm->ndm_family = AF_INET; + mnl_attr_put_u32(nlh, NDA_DST, encap->ipv4.dst); + } else { + assert(encap->mask & MLX5_FLOW_TCF_ENCAP_IPV6_DST); + ndm->ndm_family = AF_INET6; + mnl_attr_put(nlh, NDA_DST, sizeof(encap->ipv6.dst), + &encap->ipv6.dst); + } + if (encap->mask & MLX5_FLOW_TCF_ENCAP_ETH_SRC && enable) + DRV_LOG(WARNING, + "Outer ethernet source address cannot be " + "forced for VXLAN encapsulation"); + if (encap->mask & MLX5_FLOW_TCF_ENCAP_ETH_DST) + mnl_attr_put(nlh, NDA_LLADDR, sizeof(encap->eth.dst), + &encap->eth.dst); + if (!flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL)) + return 0; + return rte_flow_error_set + (error, rte_errno, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL, + "netlink: cannot complete ND request (ip neigh)"); +} + +/** + * Manage the local IP addresses and their peers IP addresses on the + * outer interface for encapsulation purposes. The kernel searches the + * appropriate device for tunnel egress traffic using the outer source + * IP, this IP should be assigned to the outer network device, otherwise + * kernel rejects the rule. + * + * Adds or removes the addresses using the Netlink command like this: + * ip addr add peer scope link dev + * + * The addresses are local to the netdev ("scope link"), this reduces + * the risk of conflicts. Note that an implicit route is maintained by + * the kernel due to the presence of a peer address (IFA_ADDRESS). + * + * @param[in] tcf + * Libmnl socket context object. + * @param[in] vtep + * VTEP object, contains rule database and ifouter index. + * @param[in] dev_flow + * Flow object, contains the tunnel parameters (for encap only). + * @param[in] enable + * Toggle between add and remove. + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +static int +flow_tcf_encap_local(struct mlx5_flow_tcf_context *tcf, + struct mlx5_flow_tcf_vtep *vtep, + struct mlx5_flow *dev_flow, + bool enable, + struct rte_flow_error *error) +{ + const struct mlx5_flow_tcf_vxlan_encap *encap = + dev_flow->tcf.vxlan_encap; + struct tcf_local_rule *rule; + bool found = false; + int ret; + + assert(encap); + assert(encap->hdr.type == MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP); + if (encap->mask & MLX5_FLOW_TCF_ENCAP_IPV4_SRC) { + assert(encap->mask & MLX5_FLOW_TCF_ENCAP_IPV4_DST); + LIST_FOREACH(rule, &vtep->local, next) { + if (rule->mask & MLX5_FLOW_TCF_ENCAP_IPV4_SRC && + encap->ipv4.src == rule->ipv4.src && + encap->ipv4.dst == rule->ipv4.dst) { + found = true; + break; + } + } + } else { + assert(encap->mask & MLX5_FLOW_TCF_ENCAP_IPV6_SRC); + assert(encap->mask & MLX5_FLOW_TCF_ENCAP_IPV6_DST); + LIST_FOREACH(rule, &vtep->local, next) { + if (rule->mask & MLX5_FLOW_TCF_ENCAP_IPV6_SRC && + !memcmp(&encap->ipv6.src, &rule->ipv6.src, + sizeof(encap->ipv6.src)) && + !memcmp(&encap->ipv6.dst, &rule->ipv6.dst, + sizeof(encap->ipv6.dst))) { + found = true; + break; + } + } + } + if (found) { + if (enable) { + rule->refcnt++; + return 0; + } + if (!rule->refcnt || !--rule->refcnt) { + LIST_REMOVE(rule, next); + return flow_tcf_rule_local(tcf, encap, + vtep->ifouter, false, error); + } + return 0; + } + if (!enable) { + DRV_LOG(WARNING, "Disabling not existing local rule"); + rte_flow_error_set + (error, ENOENT, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, "Disabling not existing local rule"); + return -ENOENT; + } + rule = rte_zmalloc(__func__, sizeof(struct tcf_local_rule), + alignof(struct tcf_local_rule)); + if (!rule) { + rte_flow_error_set + (error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, "unable to allocate memory for local rule"); + return -rte_errno; + } + *rule = (struct tcf_local_rule){.refcnt = 0, + .mask = 0, + }; + if (encap->mask & MLX5_FLOW_TCF_ENCAP_IPV4_SRC) { + rule->mask = MLX5_FLOW_TCF_ENCAP_IPV4_SRC + | MLX5_FLOW_TCF_ENCAP_IPV4_DST; + rule->ipv4.src = encap->ipv4.src; + rule->ipv4.dst = encap->ipv4.dst; + } else { + rule->mask = MLX5_FLOW_TCF_ENCAP_IPV6_SRC + | MLX5_FLOW_TCF_ENCAP_IPV6_DST; + memcpy(&rule->ipv6.src, &encap->ipv6.src, + sizeof(rule->ipv6.src)); + memcpy(&rule->ipv6.dst, &encap->ipv6.dst, + sizeof(rule->ipv6.dst)); + } + ret = flow_tcf_rule_local(tcf, encap, vtep->ifouter, true, error); + if (ret) { + rte_free(rule); + return ret; + } + rule->refcnt++; + LIST_INSERT_HEAD(&vtep->local, rule, next); + return 0; +} + +/** + * Manage the destination MAC/IP addresses neigh database, kernel uses + * this one to determine the destination MAC address within encapsulation + * header. Adds or removes the entries using the Netlink command like this: + * ip neigh add dev lladdr to nud permanent + * + * @param[in] tcf + * Libmnl socket context object. + * @param[in] vtep + * VTEP object, contains rule database and ifouter index. + * @param[in] dev_flow + * Flow object, contains the tunnel parameters (for encap only). + * @param[in] enable + * Toggle between add and remove. + * @param[out] error + * Perform verbose error reporting if not NULL. + * + * @return + * 0 on success, a negative errno value otherwise and rte_errno is set. + */ +static int +flow_tcf_encap_neigh(struct mlx5_flow_tcf_context *tcf, + struct mlx5_flow_tcf_vtep *vtep, + struct mlx5_flow *dev_flow, + bool enable, + struct rte_flow_error *error) +{ + const struct mlx5_flow_tcf_vxlan_encap *encap = + dev_flow->tcf.vxlan_encap; + struct tcf_neigh_rule *rule; + bool found = false; + int ret; + + assert(encap); + assert(encap->hdr.type == MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP); + if (encap->mask & MLX5_FLOW_TCF_ENCAP_IPV4_DST) { + assert(encap->mask & MLX5_FLOW_TCF_ENCAP_IPV4_SRC); + LIST_FOREACH(rule, &vtep->neigh, next) { + if (rule->mask & MLX5_FLOW_TCF_ENCAP_IPV4_DST && + encap->ipv4.dst == rule->ipv4.dst) { + found = true; + break; + } + } + } else { + assert(encap->mask & MLX5_FLOW_TCF_ENCAP_IPV6_SRC); + assert(encap->mask & MLX5_FLOW_TCF_ENCAP_IPV6_DST); + LIST_FOREACH(rule, &vtep->neigh, next) { + if (rule->mask & MLX5_FLOW_TCF_ENCAP_IPV6_DST && + !memcmp(&encap->ipv6.dst, &rule->ipv6.dst, + sizeof(encap->ipv6.dst))) { + found = true; + break; + } + } + } + if (found) { + if (memcmp(&encap->eth.dst, &rule->eth, + sizeof(encap->eth.dst))) { + DRV_LOG(WARNING, "Destination MAC differs" + " in neigh rule"); + rte_flow_error_set(error, EEXIST, + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, "Different MAC address" + " neigh rule for the same" + " destination IP"); + return -EEXIST; + } + if (enable) { + rule->refcnt++; + return 0; + } + if (!rule->refcnt || !--rule->refcnt) { + LIST_REMOVE(rule, next); + return flow_tcf_rule_neigh(tcf, encap, + vtep->ifouter, + false, error); + } + return 0; + } + if (!enable) { + DRV_LOG(WARNING, "Disabling not existing neigh rule"); + rte_flow_error_set + (error, ENOENT, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, "unable to allocate memory for neigh rule"); + return -ENOENT; + } + rule = rte_zmalloc(__func__, sizeof(struct tcf_neigh_rule), + alignof(struct tcf_neigh_rule)); + if (!rule) { + rte_flow_error_set + (error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, + NULL, "unadble to allocate memory for neigh rule"); + return -rte_errno; + } + *rule = (struct tcf_neigh_rule){.refcnt = 0, + .mask = 0, + }; + if (encap->mask & MLX5_FLOW_TCF_ENCAP_IPV4_DST) { + rule->mask = MLX5_FLOW_TCF_ENCAP_IPV4_DST; + rule->ipv4.dst = encap->ipv4.dst; + } else { + rule->mask = MLX5_FLOW_TCF_ENCAP_IPV6_DST; + memcpy(&rule->ipv6.dst, &encap->ipv6.dst, + sizeof(rule->ipv6.dst)); + } + memcpy(&rule->eth, &encap->eth.dst, sizeof(rule->eth)); + ret = flow_tcf_rule_neigh(tcf, encap, vtep->ifouter, true, error); + if (ret) { + rte_free(rule); + return ret; + } + rule->refcnt++; + LIST_INSERT_HEAD(&vtep->neigh, rule, next); + return 0; +} + /* VTEP device list is shared between PMD port instances. */ static LIST_HEAD(, mlx5_flow_tcf_vtep) vtep_list_vxlan = LIST_HEAD_INITIALIZER(); @@ -3715,6 +4085,7 @@ static LIST_HEAD(, mlx5_flow_tcf_vtep) { static uint16_t encap_port = MLX5_VXLAN_PORT_RANGE_MIN - 1; struct mlx5_flow_tcf_vtep *vtep, *vlst; + int ret; assert(ifouter); /* Look whether the attached VTEP for encap is created. */ @@ -3766,6 +4137,21 @@ static LIST_HEAD(, mlx5_flow_tcf_vtep) } if (!vtep) return 0; + /* Create local ipaddr with peer to specify the outer IPs. */ + ret = flow_tcf_encap_local(tcf, vtep, dev_flow, true, error); + if (ret) { + if (!vtep->refcnt) + flow_tcf_delete_iface(tcf, vtep); + return 0; + } + /* Create neigh rule to specify outer destination MAC. */ + ret = flow_tcf_encap_neigh(tcf, vtep, dev_flow, true, error); + if (ret) { + flow_tcf_encap_local(tcf, vtep, dev_flow, false, error); + if (!vtep->refcnt) + flow_tcf_delete_iface(tcf, vtep); + return 0; + } vtep->refcnt++; assert(vtep->ifindex); return vtep->ifindex; @@ -3848,11 +4234,9 @@ static LIST_HEAD(, mlx5_flow_tcf_vtep) case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP: break; case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP: -/* - * TODO: Remove the encap ancillary rules first. - * flow_tcf_encap_neigh(tcf, vtep, dev_flow, false, NULL); - * flow_tcf_encap_local(tcf, vtep, dev_flow, false, NULL); - */ + /* Remove the encap ancillary rules first. */ + flow_tcf_encap_neigh(tcf, vtep, dev_flow, false, NULL); + flow_tcf_encap_local(tcf, vtep, dev_flow, false, NULL); break; default: assert(false); -- 1.8.3.1