From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR03-DB5-obe.outbound.protection.outlook.com (mail-eopbgr40084.outbound.protection.outlook.com [40.107.4.84]) by dpdk.org (Postfix) with ESMTP id 074511F1C for ; Mon, 29 Oct 2018 12:53:37 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Mellanox.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=3ztC1jmlycT6+Zq/J4glnMYGmBslpgkqzq03qKBRzkM=; b=FyXsC6wz7BJdjVckSDHxopyLjrPWeMd9SoZdlqnxIMEtYMTNicX3edK5E6ynkOHJONFBG75bpAzWpYfeNe6MA9L+OsJvjQPx2NRsLnR/oRk4pDDMa0RDwGVUyutrDOYTpVC1wQVKKsf/ge51fZYzbJ8aX1XIpXJZQDpxKYhfu58= Received: from AM4PR05MB3265.eurprd05.prod.outlook.com (10.171.186.150) by AM4PR05MB1732.eurprd05.prod.outlook.com (10.165.246.19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1273.19; Mon, 29 Oct 2018 11:53:35 +0000 Received: from AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::544b:a68d:e6a5:ba6e]) by AM4PR05MB3265.eurprd05.prod.outlook.com ([fe80::544b:a68d:e6a5:ba6e%2]) with mapi id 15.20.1273.027; Mon, 29 Oct 2018 11:53:34 +0000 From: Slava Ovsiienko To: Yongseok Koh CC: Shahaf Shuler , "dev@dpdk.org" Thread-Topic: [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices management Thread-Index: AQHUZJFaIbp0XMz/YU2qGX8/bdbdC6UvKhsAgAFDBmCAALMzAIAALQ+wgADj9gCAA/tzUA== Date: Mon, 29 Oct 2018 11:53:34 +0000 Message-ID: References: <1538461807-37507-1-git-send-email-viacheslavo@mellanox.com> <1539612815-47199-1-git-send-email-viacheslavo@mellanox.com> <1539612815-47199-6-git-send-email-viacheslavo@mellanox.com> <20181025002759.GA26874@mtidpdk.mti.labs.mlnx> <20181026062307.GD6434@mtidpdk.mti.labs.mlnx> <20181026224248.GE13615@mtidpdk.mti.labs.mlnx> In-Reply-To: <20181026224248.GE13615@mtidpdk.mti.labs.mlnx> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=viacheslavo@mellanox.com; x-originating-ip: [95.67.35.250] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM4PR05MB1732; 6:I5RjzIPzfAhSkNtAWQ6DsasfnYzD489bfb2HfFO/rNWtckj95W/gpjUQCdaPiy8GErTc3p78Mc22o/RcNmE4VN9abmbzmL/Y6s5uZipRgurh2muGUXxHxEzacUOqBv4AsGvz0XWDx5BKCdNeA3yG6SsqhIKQPQ8QaDa7Wei7ZFKZHADvlRqnItCdH7L14QxoIq1g9UUEkFmFl30decK/z8SUVHT5+BG3vvdfsh8MO0S9xTcw4UdZtUgRWasycF2/yC970+SaqzQgFiKDSKzyWEz+AGbjRT8vPEbYggiG3J7GwyfO9EX6YIJOCDmi+DudXerMgc/FF1WW5LqNBpZVoDVvcNKLhfeLBRdcxDtFOUYeTFKlwW4guOOs0VC3fUnc4z+ZLbJrTifA0tXcNrv94JSXUs4SecKlOAOAWlRngq2C3kmprStC2s1ZsWaoRyVNTdYpxkSnToAgwB0qoPDfCkbezASdj5abNae9tLZR9i8=; 5:/gcgDO0RZDSyuM9kPbd5ln16ICYhtrJqRVBMh7WH75bWLxP8nEEkp59hxSMSLERTX2cf7EQyHLNWqmnQ7YosM/MP76Y+fwEMVX8vFsSCj2tD4gA+YXgdm8GaRLYhI/jM/gmzFNBfZHRaBPZdqhLw0pUl8KEX6IlhJuIKqbcaZqY=; 7:jWM4HB+3VMUu4yakS3nh/1Tt0J5wQESd670lDTOadzWw4xIVqFRSb1UHiG/ZGfcfxHSeQOqIS+tuT/P8OjG8k+KwJO26agS05bmMuuAgJ8sXokp5vr1SATN9R+JwCfn3gK7f9T6yhayoNY8uTD/gBw== x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 77c9428f-1b80-42c9-0e3a-08d63d9527f2 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(7020095)(4652040)(8989299)(5600074)(711020)(4618075)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(2017052603328)(7153060)(7193020); SRVR:AM4PR05MB1732; x-ms-traffictypediagnostic: AM4PR05MB1732: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(211171220733660)(788757137089); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(8211001083)(6040522)(2401047)(5005006)(8121501046)(10201501046)(3002001)(93006095)(93001095)(3231382)(944501410)(52105095)(6055026)(148016)(149066)(150057)(6041310)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123558120)(20161123564045)(20161123560045)(20161123562045)(201708071742011)(7699051)(76991095); SRVR:AM4PR05MB1732; BCL:0; PCL:0; RULEID:; SRVR:AM4PR05MB1732; x-forefront-prvs: 084080FC15 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(39860400002)(346002)(136003)(396003)(366004)(376002)(199004)(189003)(13464003)(52314003)(6116002)(3846002)(5250100002)(71190400001)(7696005)(54906003)(53546011)(106356001)(6506007)(2906002)(8676002)(71200400001)(66066001)(81166006)(81156014)(8936002)(476003)(486006)(478600001)(97736004)(68736007)(99286004)(561944003)(2900100001)(33656002)(11346002)(446003)(102836004)(53946003)(316002)(55016002)(25786009)(6246003)(14444005)(5024004)(256004)(26005)(9686003)(186003)(7736002)(5660300001)(6636002)(14454004)(105586002)(93886005)(4744004)(86362001)(53936002)(76176011)(4326008)(229853002)(6862004)(74316002)(6436002)(305945005)(569006); DIR:OUT; SFP:1101; SCL:1; SRVR:AM4PR05MB1732; H:AM4PR05MB3265.eurprd05.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1; received-spf: None (protection.outlook.com: mellanox.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: LIT62mP6AVLTmFmTdqvbtLgfyedfT67Nttk5pUT5JOyNTWbSTwj/HlCULchFd6l2m21hwwSPRALZmZvVu3ZvdopaeottTUNhu+SEc929B/1RtTC3g2AgPUupLggP+bw57IkL0BSY1pSZSbu39765dq3lotTgWtf6tepAfFD+kx4bOl1Z19bSbUD0bm0xVMdIfyHGd07O52QwlC5uRp5JZ9ULReaPekns1FdNFMUrMlW2B6Ge5D54MrTPDGmS9W9l6PIQ1fUW2SJAv/YUrmNN+DpAA0U/5RIxkskN8y9jNCkk2SdcgSquXxS7GTGVQZzSvmga3/VVw4wT3l5je5kq73EhwFYJzCduBlq+tLp0g3Q= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-Network-Message-Id: 77c9428f-1b80-42c9-0e3a-08d63d9527f2 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Oct 2018 11:53:34.7537 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM4PR05MB1732 Subject: Re: [dpdk-dev] [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Oct 2018 11:53:38 -0000 > -----Original Message----- > From: Yongseok Koh > Sent: Saturday, October 27, 2018 1:43 > To: Slava Ovsiienko > Cc: Shahaf Shuler ; dev@dpdk.org > Subject: Re: [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices > management >=20 > On Fri, Oct 26, 2018 at 02:35:24AM -0700, Slava Ovsiienko wrote: > > > -----Original Message----- > > > From: Yongseok Koh > > > Sent: Friday, October 26, 2018 9:26 > > > To: Slava Ovsiienko > > > Cc: Shahaf Shuler ; dev@dpdk.org > > > Subject: Re: [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices > > > management > > > > > > On Thu, Oct 25, 2018 at 01:21:12PM -0700, Slava Ovsiienko wrote: > > > > > -----Original Message----- > > > > > From: Yongseok Koh > > > > > Sent: Thursday, October 25, 2018 3:28 > > > > > To: Slava Ovsiienko > > > > > Cc: Shahaf Shuler ; dev@dpdk.org > > > > > Subject: Re: [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel > > > > > devices management > > > > > > > > > > On Mon, Oct 15, 2018 at 02:13:33PM +0000, Viacheslav Ovsiienko > wrote: > > > > > > VXLAN interfaces are dynamically created for each local UDP > > > > > > port of outer networks and then used as targets for TC > > > > > > "flower" filters in order to perform encapsulation. These > > > > > > VXLAN interfaces are system-wide, the only one device with > > > > > > given UDP port can exist in the system (the attempt of > > > > > > creating another device with the same UDP local port returns > > > > > > EEXIST), so PMD should support the shared device instances > > > > > > database for PMD instances. These VXLAN implicitly created devi= ces > are called VTEPs (Virtual Tunnel End Points). > > > > > > > > > > > > Creation of the VTEP occurs at the moment of rule applying. > > > > > > The link is set up, root ingress qdisc is also initialized. > > > > > > > > > > > > Encapsulation VTEPs are created on per port basis, the single > > > > > > VTEP is attached to the outer interface and is shared for all > > > > > > encapsulation rules on this interface. The source UDP port is > > > > > > automatically selected in range 30000-60000. > > > > > > > > > > > > For decapsulaton one VTEP is created per every unique UDP > > > > > > local port to accept tunnel traffic. The name of created VTEP > > > > > > consists of prefix "vmlx_" and the number of UDP port in > > > > > > decimal digits without leading zeros (vmlx_4789). The VTEP can > > > > > > be preliminary created in the system before the launching > > > > > > application, it allows to share UDP ports between primary > > > > > > and secondary processes. > > > > > > > > > > > > Suggested-by: Adrien Mazarguil > > > > > > Signed-off-by: Viacheslav Ovsiienko > > > > > > --- > > > > > > drivers/net/mlx5/mlx5_flow_tcf.c | 503 > > > > > > ++++++++++++++++++++++++++++++++++++++- > > > > > > 1 file changed, 499 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/drivers/net/mlx5/mlx5_flow_tcf.c > > > > > > b/drivers/net/mlx5/mlx5_flow_tcf.c > > > > > > index d6840d5..efa9c3b 100644 > > > > > > --- a/drivers/net/mlx5/mlx5_flow_tcf.c > > > > > > +++ b/drivers/net/mlx5/mlx5_flow_tcf.c > > > > > > @@ -3443,6 +3443,432 @@ struct pedit_parser { > > > > > > return -err; > > > > > > } > > > > > > > > > > > > +/* VTEP device list is shared between PMD port instances. */ > > > > > > +static LIST_HEAD(, mlx5_flow_tcf_vtep) > > > > > > + vtep_list_vxlan =3D LIST_HEAD_INITIALIZER(); > static > > > > > pthread_mutex_t > > > > > > +vtep_list_mutex =3D PTHREAD_MUTEX_INITIALIZER; > > > > > > > > > > What's the reason for choosing pthread_mutex instead of > rte_*_lock? > > > > > > > > The sharing this database for secondary processes? > > > > > > The static variable isn't shared with sec proc. But you can leave it = as is. > > > > Yes. The sharing just was assumed, not implemented yet. > > > > > > > > > > > + > > > > > > +/** > > > > > > + * Deletes VTEP network device. > > > > > > + * > > > > > > + * @param[in] tcf > > > > > > + * Context object initialized by mlx5_flow_tcf_context_creat= e(). > > > > > > + * @param[in] vtep > > > > > > + * Object represinting the network device to delete. Memory > > > > > > + * allocated for this object is freed by routine. > > > > > > + */ > > > > > > +static void > > > > > > +flow_tcf_delete_iface(struct mlx6_flow_tcf_context *tcf, > > > > > > + struct mlx5_flow_tcf_vtep *vtep) { > > > > > > + struct nlmsghdr *nlh; > > > > > > + struct ifinfomsg *ifm; > > > > > > + alignas(struct nlmsghdr) > > > > > > + uint8_t buf[mnl_nlmsg_size(MNL_ALIGN(sizeof(*ifm))) + 8]; > > > > > > + int ret; > > > > > > + > > > > > > + assert(!vtep->refcnt); > > > > > > + if (vtep->created && vtep->ifindex) { > > > > > > > > > > First of all vtep->created seems of no use. It is introduced to > > > > > select the error message in flow_tcf_create_iface(). I don't see > > > > > any necessity to distinguish between 'vtep is allocated by > > > > > rte_malloc()' and > > > 'vtep is created in kernel'. > > > > > > > > created flag indicates the iface is created by our code. > > > > The VXLAN decap devices must have the specified UDP port, we can > > > > not create multiple VXLAN devices with the same UDP port - EEXIST > > > > is returned. So, we have to share device. One option is create > > > > device before DPDK application launch and use these pre-created > devices. > > > > Inthis case created flag is not set and VXLAN device is not > > > > reinitialized, and > > > not deleted. > > > > > > I can't see any code to use pre-created device (created even before > > > dpdk app launch). Your code just tries to create 'vmlx_xxxx'. Even > > > from your comment in [7/7] patch, PMD will cleanup any leftovers > > > (existing vtep devices) on initialization. Your comment sounds confli= cting > and confusing. > > > > There are two types of VXLAN devices: > > > > - VXLAN decap, not attached to any ifouter. Provides the ingress UDP > > port, we try to share the devices of this type, because we may be > > asked for the specified UDP port. No device/rule cleanup and reinit > needed. > > > > - VXLAN encap, should be attached to ifouter to provide strict egress > > path, no need to share - egress UDP port does not matter. And we need > > to cleanup ifouter, remove other attached VXLAN devices and rules, > > because it is too hard to co-exist with some pre-created setup.. >=20 > I knew that. But how can it justify the need of 'created' field in vtep s= truct? > In this code, it is of no use. But will see how it is used in your v3. >=20 > > > > > And why do you need to check vtep->ifindex as well? If vtep is > > > > > created in kernel and its ifindex isn't set, that should be an > > > > > error which had to be hanled in flow_tcf_create_iface(). Such a > > > > > vtep shouldn't > > > exist. > > > > Yes, if we did not get ifindex of device - vtep is not created, err= or > returned. > > > > We just can not operate w/o ifindex. > > > > > > I know ifindex is needed but my question was checking vtep->ifindex > > > here looked redundant/unnecessary. But as you agreed on having > > > create/get/release_iface(), it doesn't matter much. > > > > Yes. I agree, will refactor the code. > > > > > > > > > > Also, the refcnt management is a bit strange. Please put an > > > > > abstraction by adding create_iface(), get_iface() and > > > > > release_iface(). In the get_ifce(), > > > > > vtep->refcnt should be incremented. And in the release_iface(), > > > > > vtep->it decrease the > > > > OK. Good proposal. I'll refactor the code. > > > > > > > > > refcnt and if it reaches to zero, the iface can be removed. > > > > > create_iface() will set the refcnt to 1. And if you refer to > > > > > mlx5_hrxq_get(), it even does searching the list not by > > > > > repeating the > > > same lookup code here and there. > > > > > That will make your code much simpler. > > > > > > > > > > > + DRV_LOG(INFO, "VTEP delete (%d)", vtep->ifindex); > > > > > > + nlh =3D mnl_nlmsg_put_header(buf); > > > > > > + nlh->nlmsg_type =3D RTM_DELLINK; > > > > > > + nlh->nlmsg_flags =3D NLM_F_REQUEST; > > > > > > + ifm =3D mnl_nlmsg_put_extra_header(nlh, > sizeof(*ifm)); > > > > > > + ifm->ifi_family =3D AF_UNSPEC; > > > > > > + ifm->ifi_index =3D vtep->ifindex; > > > > > > + ret =3D flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL); > > > > > > + if (ret) > > > > > > + DRV_LOG(WARNING, "netlink: error deleting > VXLAN > > > > > " > > > > > > + "encap/decap ifindex %u", > > > > > > + ifm->ifi_index); > > > > > > + } > > > > > > + rte_free(vtep); > > > > > > +} > > > > > > + > > > > > > +/** > > > > > > + * Creates VTEP network device. > > > > > > + * > > > > > > + * @param[in] tcf > > > > > > + * Context object initialized by mlx5_flow_tcf_context_creat= e(). > > > > > > + * @param[in] ifouter > > > > > > + * Outer interface to attach new-created VXLAN device > > > > > > + * If zero the VXLAN device will not be attached to any devi= ce. > > > > > > + * @param[in] port > > > > > > + * UDP port of created VTEP device. > > > > > > + * @param[out] error > > > > > > + * Perform verbose error reporting if not NULL. > > > > > > + * > > > > > > + * @return > > > > > > + * Pointer to created device structure on success, NULL > > > > > > +otherwise > > > > > > + * and rte_errno is set. > > > > > > + */ > > > > > > +#ifndef HAVE_IFLA_VXLAN_COLLECT_METADATA > > > > > > > > > > Why negative(ifndef) first intead of positive(ifdef)? > > > > Hm. Did I miss the rule. Positive #ifdef first? OK. > > > > > > No concrete rule but if there's no specific reason, it would be > > > better to start from ifdef. > > > > > > > > > +static struct mlx5_flow_tcf_vtep* > > > > > > +flow_tcf_create_iface(struct mlx5_flow_tcf_context *tcf > __rte_unused, > > > > > > + unsigned int ifouter __rte_unused, > > > > > > + uint16_t port __rte_unused, > > > > > > + struct rte_flow_error *error) { > > > > > > + rte_flow_error_set(error, ENOTSUP, > > > > > > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > NULL, > > > > > > + "netlink: failed to create VTEP, " > > > > > > + "VXLAN metadat is not supported by > kernel"); > > > > > > > > > > Typo. > > > > > > > > OK. "metadata are not supported". > > > > > > > > > > > + return NULL; > > > > > > +} > > > > > > +#else > > > > > > +static struct mlx5_flow_tcf_vtep* > > > > > > +flow_tcf_create_iface(struct mlx5_flow_tcf_context *tcf, > > > > > > > > > > How about adding 'vtep'? It sounds vague - creating a general > interface. > > > > > E.g., flow_tcf_create_vtep_iface()? > > > > > > > > OK. > > > > > > > > > > > > > > > + unsigned int ifouter, > > > > > > + uint16_t port, struct rte_flow_error *error) { > > > > > > + struct mlx5_flow_tcf_vtep *vtep; > > > > > > + struct nlmsghdr *nlh; > > > > > > + struct ifinfomsg *ifm; > > > > > > + char name[sizeof(MLX5_VXLAN_DEVICE_PFX) + 24]; > > > > > > + alignas(struct nlmsghdr) > > > > > > + uint8_t buf[mnl_nlmsg_size(sizeof(*ifm)) + 128 + > > > > > > > > > > Use a macro for '128'. Can't know the meaning. > > > > OK. I think we should calculate the buffer size explicitly. > > > > > > > > > > > > > > > + SZ_NLATTR_DATA_OF(sizeof(name)) + > > > > > > + SZ_NLATTR_NEST * 2 + > > > > > > + SZ_NLATTR_STRZ_OF("vxlan") + > > > > > > + SZ_NLATTR_DATA_OF(sizeof(uint32_t)) + > > > > > > + SZ_NLATTR_DATA_OF(sizeof(uint32_t)) + > > > > > > + SZ_NLATTR_DATA_OF(sizeof(uint16_t)) + > > > > > > + SZ_NLATTR_DATA_OF(sizeof(uint8_t))]; > > > > > > + struct nlattr *na_info; > > > > > > + struct nlattr *na_vxlan; > > > > > > + rte_be16_t vxlan_port =3D RTE_BE16(port); > > > > > > > > > > Use rte_cpu_to_be_*() instead. > > > > > > > > Yes, I'll recheck the whole code for this issue. > > > > > > > > > > > > > > > + int ret; > > > > > > + > > > > > > + vtep =3D rte_zmalloc(__func__, sizeof(*vtep), > > > > > > + alignof(struct mlx5_flow_tcf_vtep)); > > > > > > + if (!vtep) { > > > > > > + rte_flow_error_set > > > > > > + (error, ENOMEM, > > > > > RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > > > > > > + NULL, "unadble to allocate memory for > VTEP desc"); > > > > > > + return NULL; > > > > > > + } > > > > > > + *vtep =3D (struct mlx5_flow_tcf_vtep){ > > > > > > + .refcnt =3D 0, > > > > > > + .port =3D port, > > > > > > + .created =3D 0, > > > > > > + .ifouter =3D 0, > > > > > > + .ifindex =3D 0, > > > > > > + .local =3D LIST_HEAD_INITIALIZER(), > > > > > > + .neigh =3D LIST_HEAD_INITIALIZER(), > > > > > > + }; > > > > > > + memset(buf, 0, sizeof(buf)); > > > > > > + nlh =3D mnl_nlmsg_put_header(buf); > > > > > > + nlh->nlmsg_type =3D RTM_NEWLINK; > > > > > > + nlh->nlmsg_flags =3D NLM_F_REQUEST | NLM_F_CREATE | > > > > > NLM_F_EXCL; > > > > > > + ifm =3D mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm)); > > > > > > + ifm->ifi_family =3D AF_UNSPEC; > > > > > > + ifm->ifi_type =3D 0; > > > > > > + ifm->ifi_index =3D 0; > > > > > > + ifm->ifi_flags =3D IFF_UP; > > > > > > + ifm->ifi_change =3D 0xffffffff; > > > > > > + snprintf(name, sizeof(name), "%s%u", > MLX5_VXLAN_DEVICE_PFX, > > > > > port); > > > > > > + mnl_attr_put_strz(nlh, IFLA_IFNAME, name); > > > > > > + na_info =3D mnl_attr_nest_start(nlh, IFLA_LINKINFO); > > > > > > + assert(na_info); > > > > > > + mnl_attr_put_strz(nlh, IFLA_INFO_KIND, "vxlan"); > > > > > > + na_vxlan =3D mnl_attr_nest_start(nlh, IFLA_INFO_DATA); > > > > > > + if (ifouter) > > > > > > + mnl_attr_put_u32(nlh, IFLA_VXLAN_LINK, ifouter); > > > > > > + assert(na_vxlan); > > > > > > + mnl_attr_put_u8(nlh, IFLA_VXLAN_COLLECT_METADATA, 1); > > > > > > + mnl_attr_put_u8(nlh, IFLA_VXLAN_UDP_ZERO_CSUM6_RX, > 1); > > > > > > + mnl_attr_put_u8(nlh, IFLA_VXLAN_LEARNING, 0); > > > > > > + mnl_attr_put_u16(nlh, IFLA_VXLAN_PORT, vxlan_port); > > > > > > + mnl_attr_nest_end(nlh, na_vxlan); > > > > > > + mnl_attr_nest_end(nlh, na_info); > > > > > > + assert(sizeof(buf) >=3D nlh->nlmsg_len); > > > > > > + ret =3D flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL); > > > > > > + if (ret) > > > > > > + DRV_LOG(WARNING, > > > > > > + "netlink: VTEP %s create failure (%d)", > > > > > > + name, rte_errno); > > > > > > + else > > > > > > + vtep->created =3D 1; > > > > > > > > > > Flow of code here isn't smooth, thus could be error-prone. Most > > > > > of all, I don't like ret has multiple meanings. ret should be > > > > > return value but you are using it to store ifindex. > > > > > > > > > > > + if (ret && ifouter) > > > > > > + ret =3D 0; > > > > > > + else > > > > > > + ret =3D if_nametoindex(name); > > > > > > > > > > If vtep isn't created and ifouter is set, then skip init below, > > > > > which means, if > > > > > > > > ifouter is set for VXLAN encap devices. They should be attached to > > > > ifouter and can not be shared. So, if ifouter I set - we do not > > > > use the precreated/existing VXLAN devices. We have to create our > > > > own not > > > shared device. > > > > > > In your code (flow_tcf_encap_vtep_create()), it is shared by multiple > flows. > > > Do you mean it isn't shared between different outer ifaces? If so, > > > that's for sure. > > Sorry, I do not understand the question. > > VXLAN encap device is attached to ifouter and shared by all flows with > > this ifouter. No multiple VXLAN devices are attached to the same ifoute= r, > only one. > > VXLAN decap device has no attached ifouter, so it can not share it. >=20 > Yep, that's what I meant. >=20 > > > > > vtep is created or ifouter is set, it tries to get ifindex of vte= p. > > > > > But why do you want to try to call this API even if it failed to = create > vtep? > > > > > Let's not make code flow convoluted even though it logically work= s. > > > > > Let's make it straightforward. > > > > > > > > > > > + if (ret) { > > > > > > + vtep->ifindex =3D ret; > > > > > > + vtep->ifouter =3D ifouter; > > > > > > + memset(buf, 0, sizeof(buf)); > > > > > > + nlh =3D mnl_nlmsg_put_header(buf); > > > > > > + nlh->nlmsg_type =3D RTM_NEWLINK; > > > > > > + nlh->nlmsg_flags =3D NLM_F_REQUEST; > > > > > > + ifm =3D mnl_nlmsg_put_extra_header(nlh, > sizeof(*ifm)); > > > > > > + ifm->ifi_family =3D AF_UNSPEC; > > > > > > + ifm->ifi_type =3D 0; > > > > > > + ifm->ifi_index =3D vtep->ifindex; > > > > > > + ifm->ifi_flags =3D IFF_UP; > > > > > > + ifm->ifi_change =3D IFF_UP; > > > > > > + ret =3D flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL); > > > > > > + if (ret) { > > > > > > + DRV_LOG(WARNING, > > > > > > + "netlink: VTEP %s set link up failure > (%d)", > > > > > > + name, rte_errno); > > > > > > + rte_free(vtep); > > > > > > + rte_flow_error_set > > > > > > + (error, -errno, > > > > > > + > RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > > > > > NULL, > > > > > > + "netlink: failed to set VTEP link up"); > > > > > > + vtep =3D NULL; > > > > > > + } else { > > > > > > + ret =3D mlx5_flow_tcf_init(tcf, vtep->ifindex, > error); > > > > > > + if (ret) > > > > > > + DRV_LOG(WARNING, > > > > > > + "VTEP %s init failure (%d)", name, > rte_errno); > > > > > > + } > > > > > > + } else { > > > > > > + DRV_LOG(WARNING, > > > > > > + "VTEP %s failed to get index (%d)", name, > errno); > > > > > > + rte_flow_error_set > > > > > > + (error, -errno, > > > > > > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > NULL, > > > > > > + !vtep->created ? "netlink: failed to create > VTEP" : > > > > > > + "netlink: failed to retrieve VTEP ifindex"); > > > > > > + ret =3D 1; > > > > > > > > > > If it fails to create a vtep above, it will print out two > > > > > warning messages and one rte_flow_error message. And it even > > > > > selects message to print between two? > > > > > And there's another info msg at the end even in case of failure. > > > > > Do you really want to do this even with manipulating ret to > > > > > change code path? Not a good practice. > > > > > > > > > > Usually, code path should be straightforward for sucessful path > > > > > and for errors/failures, return immediately or use 'goto' if > > > > > there's need for > > > cleanup. > > > > > > > > > > Please refactor entire function. > > > > > > > > I think I'll split it in two ones - for attached and potentially sh= ared > ifaces. > > > > > > > > > > > + } > > > > > > + if (ret) { > > > > > > + flow_tcf_delete_iface(tcf, vtep); > > > > > > + vtep =3D NULL; > > > > > > + } > > > > > > + DRV_LOG(INFO, "VTEP create (%d, %s)", vtep->port, vtep ? > "OK" : > > > > > "error"); > > > > > > + return vtep; > > > > > > +} > > > > > > +#endif /* HAVE_IFLA_VXLAN_COLLECT_METADATA */ > > > > > > + > > > > > > +/** > > > > > > + * Create target interface index for VXLAN tunneling > decapsulation. > > > > > > + * In order to share the UDP port within the other interfaces > > > > > > +the > > > > > > + * VXLAN device created as not attached to any interface (if > created). > > > > > > + * > > > > > > + * @param[in] tcf > > > > > > + * Context object initialized by mlx5_flow_tcf_context_creat= e(). > > > > > > + * @param[in] dev_flow > > > > > > + * Flow tcf object with tunnel structure pointer set. > > > > > > + * @param[out] error > > > > > > + * Perform verbose error reporting if not NULL. > > > > > > + * @return > > > > > > + * Interface index on success, zero otherwise and rte_errno = is set. > > > > > > > > > > Return negative errno in case of failure like others. > > > > > > > > Anyway, we have to return an index. If we do not return it as > > > > function result we will need to provide some extra pointing > > > > parameter, it > > > complicates the code. > > > > > > You misunderstood it. See what I wrote below. The function still > > > returns the index but in case of error, make it return negative errno > instead of zero. > > > > > > > > > > > > > * Interface index on success, a negative errno value otherwise= and > > > > > rte_errno is set. > > > > > > > > > > > + */ > > > > > > +static unsigned int > > > > > > +flow_tcf_decap_vtep_create(struct mlx5_flow_tcf_context *tcf, > > > > > > + struct mlx5_flow *dev_flow, > > > > > > + struct rte_flow_error *error) { > > > > > > + struct mlx5_flow_tcf_vtep *vtep, *vlst; > > > > > > + uint16_t port =3D dev_flow->tcf.vxlan_decap->udp_port; > > > > > > + > > > > > > + vtep =3D NULL; > > > > > > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) { > > > > > > + if (vlst->port =3D=3D port) { > > > > > > + vtep =3D vlst; > > > > > > + break; > > > > > > + } > > > > > > + } > > > > > > > > > > You just need one variable. > > > > > > > > Yes. There is a long story, I forgot to revert code to one > > > > variable after > > > debugging. > > > > > > > > > > struct mlx5_flow_tcf_vtep *vtep; > > > > > > > > > > LIST_FOREACH(vtep, &vtep_list_vxlan, next) { > > > > > if (vtep->port =3D=3D port) > > > > > break; > > > > > } > > > > > > > > > > > + if (!vtep) { > > > > > > + vtep =3D flow_tcf_create_iface(tcf, 0, port, error); > > > > > > + if (vtep) > > > > > > + LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, > next); > > > > > > + } else { > > > > > > + if (vtep->ifouter) { > > > > > > + rte_flow_error_set(error, -errno, > > > > > > + > RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > > > > > NULL, > > > > > > + "Failed to create decap VTEP, > attached " > > > > > > + "device with the same UDP port > exists"); > > > > > > + vtep =3D NULL; > > > > > > > > > > Making vtep null to skip the following code? > > > > > > > > Yes. To avoid multiple return operators in code. > > > > > > It's okay to have multiple returns. Why not? > > > > It is easy to miss the return in the midst of function while > refactoring/modifying the code. >=20 > Your code path doesn't look easy and free from error. Please refer to oth= er > control path functions in this PMD. >=20 > > > > > Please merge the two same > > > > > if/else and make the code path strightforward. And which errno > > > > > do you expect here? > > > > > Should it be set EEXIST instead? > > > > Not always. Netlink returns the code. > > > > > > No, that's not my point. Your code above sets errno instead of > > > rte_errno or EEXIST. > > > > > > } else { > > > if (vtep->ifouter) { > > > rte_flow_error_set(error, -errno, > > > > > > Which one sets this errno? Here, it sets rte_errno because matched > > > vtep > > libmnl sets, while processing the Netlink reply message (callback.c of = libmnl > sources). >=20 > You still don't understand my point. >=20 > In this flow_tcf_decap_vtep_create(), if vtep is found (vtep !=3D NULL), = how > can errno be set? Before the if/else, there's no libmnl call. >=20 > > > can't be used as it already has outer iface attached (error message > > > isn't clear, please reword it too). I thought this should be EEXIST > > > but you set errno to rte_errno but errno isn't valid at this point. > > > > > > > > > > > > > > > > > > + } > > > > > > + } > > > > > > + if (vtep) { > > > > > > + vtep->refcnt++; > > > > > > + assert(vtep->ifindex); > > > > > > + return vtep->ifindex; > > > > > > + } else { > > > > > > + return 0; > > > > > > + } > > > > > > > > > > Why repeating same if/else? > > > > > > > > > > > > > > > This is my suggestion but if you take my suggestion to have > > > > > flow_tcf_[create|get|release]_iface(), this will get much simpler= . > > > > Agree. > > > > > > > > > > > > > > { > > > > > struct mlx5_flow_tcf_vtep *vtep; > > > > > uint16_t port =3D dev_flow->tcf.vxlan_decap->udp_port; > > > > > > > > > > LIST_FOREACH(vtep, &vtep_list_vxlan, next) { > > > > > if (vtep->port =3D=3D port) > > > > > break; > > > > > } > > > > > if (vtep && vtep->ifouter) > > > > > return rte_flow_error_set(... EEXIST ...); > > > > > else if (vtep) { > > > > > ++vtep->refcnt; > > > > > } else { > > > > > vtep =3D flow_tcf_create_iface(tcf, 0, port, error); > > > > > if (!vtep) > > > > > return rte_flow_error_set(...); > > > > > LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next); > > > > > } > > > > > assert(vtep->ifindex); > > > > > return vtep->ifindex; > > > > > } > > > > > > > > > > > > > > > > +} > > > > > > + > > > > > > +/** > > > > > > + * Creates target interface index for VXLAN tunneling > encapsulation. > > > > > > + * > > > > > > + * @param[in] tcf > > > > > > + * Context object initialized by mlx5_flow_tcf_context_creat= e(). > > > > > > + * @param[in] ifouter > > > > > > + * Network interface index to attach VXLAN encap device to. > > > > > > + * @param[in] dev_flow > > > > > > + * Flow tcf object with tunnel structure pointer set. > > > > > > + * @param[out] error > > > > > > + * Perform verbose error reporting if not NULL. > > > > > > + * @return > > > > > > + * Interface index on success, zero otherwise and rte_errno = is set. > > > > > > + */ > > > > > > +static unsigned int > > > > > > +flow_tcf_encap_vtep_create(struct mlx5_flow_tcf_context *tcf, > > > > > > + unsigned int ifouter, > > > > > > + struct mlx5_flow *dev_flow __rte_unused, > > > > > > + struct rte_flow_error *error) { > > > > > > + static uint16_t encap_port =3D > MLX5_VXLAN_PORT_RANGE_MIN - 1; > > > > > > + struct mlx5_flow_tcf_vtep *vtep, *vlst; > > > > > > + > > > > > > + assert(ifouter); > > > > > > + /* Look whether the attached VTEP for encap is created. */ > > > > > > + vtep =3D NULL; > > > > > > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) { > > > > > > + if (vlst->ifouter =3D=3D ifouter) { > > > > > > + vtep =3D vlst; > > > > > > + break; > > > > > > + } > > > > > > + } > > > > > > > > > > Same here. > > > > > > > > > > > + if (!vtep) { > > > > > > + uint16_t pcnt; > > > > > > + > > > > > > + /* Not found, we should create the new attached > VTEP. */ > > > > > > +/* > > > > > > + * TODO: not implemented yet > > > > > > + * flow_tcf_encap_iface_cleanup(tcf, ifouter); > > > > > > + * flow_tcf_encap_local_cleanup(tcf, ifouter); > > > > > > + * flow_tcf_encap_neigh_cleanup(tcf, ifouter); */ > > > > > > > > > > Personal note is not appropriate even though it is removed in > > > > > the following patch. > > > > > > > > > > > + for (pcnt =3D 0; pcnt <=3D > (MLX5_VXLAN_PORT_RANGE_MAX > > > > > > + - > MLX5_VXLAN_PORT_RANGE_MIN); > > > > > pcnt++) { > > > > > > + encap_port++; > > > > > > + /* Wraparound the UDP port index. */ > > > > > > + if (encap_port < > MLX5_VXLAN_PORT_RANGE_MIN > > > > > || > > > > > > + encap_port > > MLX5_VXLAN_PORT_RANGE_MAX) > > > > > > + encap_port =3D > > > > > MLX5_VXLAN_PORT_RANGE_MIN; > > > > > > + /* Check whether UDP port is in already in > use. */ > > > > > > + vtep =3D NULL; > > > > > > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) { > > > > > > + if (vlst->port =3D=3D encap_port) { > > > > > > + vtep =3D vlst; > > > > > > + break; > > > > > > + } > > > > > > + } > > > > > > > > > > If you want to find out an empty port number, you can use > > > > > rte_bitmap instead of repeating searching the entire list for > > > > > all possible port > > > numbers. > > > > > > > > We do not expect too many VXLAN devices have been created. bitmap. > > > > > > +1, valid point. > > > > > > > > > + if (vtep) { > > > > > > + vtep =3D NULL; > > > > > > + continue; > > > > > > + } > > > > > > + vtep =3D flow_tcf_create_iface(tcf, ifouter, > > > > > > + encap_port, > error); > > > > > > + if (vtep) { > > > > > > + LIST_INSERT_HEAD(&vtep_list_vxlan, > vtep, > > > > > next); > > > > > > + break; > > > > > > + } > > > > > > + if (rte_errno !=3D EEXIST) > > > > > > + break; > > > > > > + } > > > > > > + } > > > > > > + if (!vtep) > > > > > > + return 0; > > > > > > + vtep->refcnt++; > > > > > > + assert(vtep->ifindex); > > > > > > + return vtep->ifindex; > > > > > > > > > > Please refactor this func according to what I suggested for > > > > > flow_tcf_decap_vtep_create() and flow_tcf_delete_iface(). > > > > > > > > > > > +} > > > > > > + > > > > > > +/** > > > > > > + * Creates target interface index for tunneling of any type. > > > > > > + * > > > > > > + * @param[in] tcf > > > > > > + * Context object initialized by mlx5_flow_tcf_context_creat= e(). > > > > > > + * @param[in] ifouter > > > > > > + * Network interface index to attach VXLAN encap device to. > > > > > > + * @param[in] dev_flow > > > > > > + * Flow tcf object with tunnel structure pointer set. > > > > > > + * @param[out] error > > > > > > + * Perform verbose error reporting if not NULL. > > > > > > + * @return > > > > > > + * Interface index on success, zero otherwise and rte_errno = is set. > > > > > > > > > > * Interface index on success, a negative errno value otherwise= and > > > > > * rte_errno is set. > > > > > > > > > > > + */ > > > > > > +static unsigned int > > > > > > +flow_tcf_tunnel_vtep_create(struct mlx5_flow_tcf_context *tcf, > > > > > > + unsigned int ifouter, > > > > > > + struct mlx5_flow *dev_flow, > > > > > > + struct rte_flow_error *error) { > > > > > > + unsigned int ret; > > > > > > + > > > > > > + assert(dev_flow->tcf.tunnel); > > > > > > + pthread_mutex_lock(&vtep_list_mutex); > > > > > > + switch (dev_flow->tcf.tunnel->type) { > > > > > > + case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP: > > > > > > + ret =3D flow_tcf_encap_vtep_create(tcf, ifouter, > > > > > > + dev_flow, error); > > > > > > + break; > > > > > > + case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP: > > > > > > + ret =3D flow_tcf_decap_vtep_create(tcf, dev_flow, > error); > > > > > > + break; > > > > > > + default: > > > > > > + rte_flow_error_set(error, ENOTSUP, > > > > > > + > RTE_FLOW_ERROR_TYPE_UNSPECIFIED, > > > > > NULL, > > > > > > + "unsupported tunnel type"); > > > > > > + ret =3D 0; > > > > > > + break; > > > > > > + } > > > > > > + pthread_mutex_unlock(&vtep_list_mutex); > > > > > > + return ret; > > > > > > +} > > > > > > + > > > > > > +/** > > > > > > + * Deletes tunneling interface by UDP port. > > > > > > + * > > > > > > + * @param[in] tcf > > > > > > + * Context object initialized by mlx5_flow_tcf_context_creat= e(). > > > > > > + * @param[in] ifindex > > > > > > + * Network interface index of VXLAN device. > > > > > > + * @param[in] dev_flow > > > > > > + * Flow tcf object with tunnel structure pointer set. > > > > > > + */ > > > > > > +static void > > > > > > +flow_tcf_tunnel_vtep_delete(struct mlx5_flow_tcf_context *tcf, > > > > > > + unsigned int ifindex, > > > > > > + struct mlx5_flow *dev_flow) { > > > > > > + struct mlx5_flow_tcf_vtep *vtep, *vlst; > > > > > > + > > > > > > + assert(dev_flow->tcf.tunnel); > > > > > > + pthread_mutex_lock(&vtep_list_mutex); > > > > > > + vtep =3D NULL; > > > > > > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) { > > > > > > + if (vlst->ifindex =3D=3D ifindex) { > > > > > > + vtep =3D vlst; > > > > > > + break; > > > > > > + } > > > > > > + } > > > > > > > > > > It is weird. You just can have vtep pointer in the > > > > > dev_flow->tcf.tunnel instead of ifindex_tun which is same as > > > > > vtep->ifindex like the assertion below. Then, this lookup can be > skipped. > > > > > > > > OK. Good optimization. > > > > > > > > > > > > > > > + if (!vtep) { > > > > > > + DRV_LOG(WARNING, "No VTEP device found in the > list"); > > > > > > + goto exit; > > > > > > + } > > > > > > + switch (dev_flow->tcf.tunnel->type) { > > > > > > + case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP: > > > > > > + break; > > > > > > + case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP: > > > > > > +/* > > > > > > + * TODO: Remove the encap ancillary rules first. > > > > > > + * flow_tcf_encap_neigh(tcf, vtep, dev_flow, false, NULL); > > > > > > + * flow_tcf_encap_local(tcf, vtep, dev_flow, false, NULL); > > > > > > +*/ > > > > > > > > > > Is it a personal note? Please remove. > > > > OK. > > > > > > > > > > > > > > > + break; > > > > > > + default: > > > > > > + assert(false); > > > > > > + DRV_LOG(WARNING, "Unsupported tunnel type"); > > > > > > + break; > > > > > > + } > > > > > > + assert(dev_flow->tcf.tunnel->ifindex_tun =3D=3D vtep->ifindex= ); > > > > > > + assert(vtep->refcnt); > > > > > > + if (!vtep->refcnt || !--vtep->refcnt) { > > > > > > + LIST_REMOVE(vtep, next); > > > > > > + flow_tcf_delete_iface(tcf, vtep); > > > > > > + } > > > > > > +exit: > > > > > > + pthread_mutex_unlock(&vtep_list_mutex); > > > > > > +} > > > > > > + > > > > > > /** > > > > > > * Apply flow to E-Switch by sending Netlink message. > > > > > > * > > > > > > @@ -3461,18 +3887,61 @@ struct pedit_parser { > > > > > > struct rte_flow_error *error) { > > > > > > struct priv *priv =3D dev->data->dev_private; > > > > > > - struct mlx5_flow_tcf_context *nl =3D priv->tcf_context; > > > > > > + struct mlx5_flow_tcf_context *tcf =3D priv->tcf_context; > > > > > > struct mlx5_flow *dev_flow; > > > > > > struct nlmsghdr *nlh; > > > > > > + int ret; > > > > > > > > > > > > dev_flow =3D LIST_FIRST(&flow->dev_flows); > > > > > > /* E-Switch flow can't be expanded. */ > > > > > > assert(!LIST_NEXT(dev_flow, next)); > > > > > > + if (dev_flow->tcf.applied) > > > > > > + return 0; > > > > > > nlh =3D dev_flow->tcf.nlh; > > > > > > nlh->nlmsg_type =3D RTM_NEWTFILTER; > > > > > > nlh->nlmsg_flags =3D NLM_F_REQUEST | NLM_F_CREATE | > > > > > NLM_F_EXCL; > > > > > > - if (!flow_tcf_nl_ack(nl, nlh, 0, NULL, NULL)) > > > > > > + if (dev_flow->tcf.tunnel) { > > > > > > + /* > > > > > > + * Replace the interface index, target for > > > > > > + * encapsulation, source for decapsulation. > > > > > > + */ > > > > > > + assert(!dev_flow->tcf.tunnel->ifindex_tun); > > > > > > + assert(dev_flow->tcf.tunnel->ifindex_ptr); > > > > > > + /* Create actual VTEP device when rule is being > applied. */ > > > > > > + dev_flow->tcf.tunnel->ifindex_tun > > > > > > + =3D flow_tcf_tunnel_vtep_create(tcf, > > > > > > + *dev_flow->tcf.tunnel- > >ifindex_ptr, > > > > > > + dev_flow, error); > > > > > > + DRV_LOG(INFO, "Replace ifindex: %d->%d", > > > > > > + dev_flow->tcf.tunnel->ifindex_tun, > > > > > > + *dev_flow->tcf.tunnel->ifindex_ptr); > > > > > > + if (!dev_flow->tcf.tunnel->ifindex_tun) > > > > > > + return -rte_errno; > > > > > > + dev_flow->tcf.tunnel->ifindex_org > > > > > > + =3D *dev_flow->tcf.tunnel->ifindex_ptr; > > > > > > + *dev_flow->tcf.tunnel->ifindex_ptr > > > > > > + =3D dev_flow->tcf.tunnel->ifindex_tun; > > > > > > + } > > > > > > + ret =3D flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL); > > > > > > + if (dev_flow->tcf.tunnel) { > > > > > > + DRV_LOG(INFO, "Restore ifindex: %d->%d", > > > > > > + dev_flow->tcf.tunnel->ifindex_org, > > > > > > + *dev_flow->tcf.tunnel->ifindex_ptr); > > > > > > + *dev_flow->tcf.tunnel->ifindex_ptr > > > > > > + =3D dev_flow->tcf.tunnel->ifindex_org; > > > > > > + dev_flow->tcf.tunnel->ifindex_org =3D 0; > > > > > > > > > > ifindex_org looks a temporary storage in this code. And this > > > > > kind of hassle > > > > > (replace/restore) is there because you took the ifindex from the > > > > > netlink message. Why don't you have just > > > > > > > > > > struct mlx5_flow_tcf_tunnel_hdr { > > > > > uint32_t type; /**< Tunnel action type. */ > > > > > unsigned int ifindex; /**< Original dst/src interface */ > > > > > struct mlx5_flow_tcf_vtep *vtep; /**< Tunnel endpoint device. */ > > > > > unsigned int *nlmsg_ifindex_ptr; /**< ifindex ptr in Netlink mes= sage. > > > > > */ }; > > > > > > > > > > and don't change ifindex? > > > > > > > > I propose to use the local variable for ifindex_org and do not > > > > keep it in structure. *ifindex_ptr will keep. > > > > > > Well, you still have to restore the ifindex whenever sending the nl > > > msg. Most of all, ifindex_ptr in nl msg isn't a right place to store = the > ifindex. > > It is stored there for rules w/o tunnels. It is its "native" place, Id > > prefer not to create some new location to store the original index and = save > some space. > > We have to swap indices only if rule has requested the tunneling. We > > can not >=20 > No no. At this point, flow is already created to be tunneled one. What do= you > mean by 'rules w/o tunnels' or 'only if rule has requested the tunneling'= ?? I mean the code handles all kind of rules - with tunnel and w/o tunnels. The same code prepares the NL message for both rule types. > It has already been created as a vxlan tunnel rule. It won't be changed. = The > nlmsg is supposed to have vtep ifindex but translation didn't know it and > stored the outer iface temporarily to get it replaced by vtep ifindex. It= never > be a 'native'/'original' place to store it. I mean, if rule does not request the tunneling action - it just keeps the unchanged ifindex within Netlink message. If there is the tunneling - we re= place this index with some value depending on this ifindex. We cannot replace ifindex permanently at rule translation once, because VTEPs are created=20 dynamically and VTEP ifindex can be different at the rule applying time. So, we need to keep the original ifindex and create VTEP depending on it ev= ery time rule is being applied. > In which case the nl msg can be sent > with the 'original' ifindex? Any specific example? No. >=20 > > set tunnel index permanently, because rule can be > > applied/removed/reapplied and other new VXLAN device with new index > >can be recreated. >=20 > Every time it is applied, it will get the vtep and overwrite vtep ifindex= in the nl > msg. Yes. We should overwrite this field anyway, and every time at rule applying= . Because vtep ifindex can be different. And we need to keep the original ifi= ndex (for example to dynamically create VTEP attached to it). Do you propose to= keep ifindex_org field? Now, as I can see we have to keep ifindex_ptr field onl= y. >=20 > > > have vtep ifindex but it just temporarily keeps the device ifindex > > > until vtep is created/found. >=20 > Thanks, > Yongseok