DPDK patches and discussions
 help / color / mirror / Atom feed
From: Yongseok Koh <yskoh@mellanox.com>
To: Slava Ovsiienko <viacheslavo@mellanox.com>
Cc: Shahaf Shuler <shahafs@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices management
Date: Thu, 25 Oct 2018 00:28:16 +0000
Message-ID: <20181025002759.GA26874@mtidpdk.mti.labs.mlnx> (raw)
In-Reply-To: <1539612815-47199-6-git-send-email-viacheslavo@mellanox.com>

On Mon, Oct 15, 2018 at 02:13:33PM +0000, Viacheslav Ovsiienko wrote:
> VXLAN interfaces are dynamically created for each local UDP port
> of outer networks and then used as targets for TC "flower" filters
> in order to perform encapsulation. These VXLAN interfaces are
> system-wide, the only one device with given UDP port can exist
> in the system (the attempt of creating another device with the
> same UDP local port returns EEXIST), so PMD should support the
> shared device instances database for PMD instances. These VXLAN
> implicitly created devices are called VTEPs (Virtual Tunnel
> End Points).
> 
> Creation of the VTEP occurs at the moment of rule applying. The
> link is set up, root ingress qdisc is also initialized.
> 
> Encapsulation VTEPs are created on per port basis, the single
> VTEP is attached to the outer interface and is shared for all
> encapsulation rules on this interface. The source UDP port is
> automatically selected in range 30000-60000.
> 
> For decapsulaton one VTEP is created per every unique UDP
> local port to accept tunnel traffic. The name of created
> VTEP consists of prefix "vmlx_" and the number of UDP port in
> decimal digits without leading zeros (vmlx_4789). The VTEP
> can be preliminary created in the system before the launching
> application, it allows to share	UDP ports between primary
> and secondary processes.
> 
> Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_flow_tcf.c | 503 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 499 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_flow_tcf.c b/drivers/net/mlx5/mlx5_flow_tcf.c
> index d6840d5..efa9c3b 100644
> --- a/drivers/net/mlx5/mlx5_flow_tcf.c
> +++ b/drivers/net/mlx5/mlx5_flow_tcf.c
> @@ -3443,6 +3443,432 @@ struct pedit_parser {
>  	return -err;
>  }
>  
> +/* VTEP device list is shared between PMD port instances. */
> +static LIST_HEAD(, mlx5_flow_tcf_vtep)
> +			vtep_list_vxlan = LIST_HEAD_INITIALIZER();
> +static pthread_mutex_t vtep_list_mutex = PTHREAD_MUTEX_INITIALIZER;

What's the reason for choosing pthread_mutex instead of rte_*_lock?

> +
> +/**
> + * Deletes VTEP network device.
> + *
> + * @param[in] tcf
> + *   Context object initialized by mlx5_flow_tcf_context_create().
> + * @param[in] vtep
> + *   Object represinting the network device to delete. Memory
> + *   allocated for this object is freed by routine.
> + */
> +static void
> +flow_tcf_delete_iface(struct mlx6_flow_tcf_context *tcf,
> +		      struct mlx5_flow_tcf_vtep *vtep)
> +{
> +	struct nlmsghdr *nlh;
> +	struct ifinfomsg *ifm;
> +	alignas(struct nlmsghdr)
> +	uint8_t buf[mnl_nlmsg_size(MNL_ALIGN(sizeof(*ifm))) + 8];
> +	int ret;
> +
> +	assert(!vtep->refcnt);
> +	if (vtep->created && vtep->ifindex) {

First of all vtep->created seems of no use. It is introduced to select the error
message in flow_tcf_create_iface(). I don't see any necessity to distinguish
between 'vtep is allocated by rte_malloc()' and 'vtep is created in kernel'.

And why do you need to check vtep->ifindex as well? If vtep is created in kernel
and its ifindex isn't set, that should be an error which had to be hanled in
flow_tcf_create_iface(). Such a vtep shouldn't exist.

Also, the refcnt management is a bit strange. Please put an abstraction by
adding create_iface(), get_iface() and release_iface(). In the get_ifce(),
vtep->refcnt should be incremented. And in the release_iface(), it decrease the
refcnt and if it reaches to zero, the iface can be removed. create_iface() will
set the refcnt to 1. And if you refer to mlx5_hrxq_get(), it even does searching
the list not by repeating the same lookup code here and there. That will make
your code much simpler.

> +		DRV_LOG(INFO, "VTEP delete (%d)", vtep->ifindex);
> +		nlh = mnl_nlmsg_put_header(buf);
> +		nlh->nlmsg_type = RTM_DELLINK;
> +		nlh->nlmsg_flags = NLM_F_REQUEST;
> +		ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
> +		ifm->ifi_family = AF_UNSPEC;
> +		ifm->ifi_index = vtep->ifindex;
> +		ret = flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL);
> +		if (ret)
> +			DRV_LOG(WARNING, "netlink: error deleting VXLAN "
> +					 "encap/decap ifindex %u",
> +					 ifm->ifi_index);
> +	}
> +	rte_free(vtep);
> +}
> +
> +/**
> + * Creates VTEP network device.
> + *
> + * @param[in] tcf
> + *   Context object initialized by mlx5_flow_tcf_context_create().
> + * @param[in] ifouter
> + *   Outer interface to attach new-created VXLAN device
> + *   If zero the VXLAN device will not be attached to any device.
> + * @param[in] port
> + *   UDP port of created VTEP device.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + *
> + * @return
> + * Pointer to created device structure on success, NULL otherwise
> + * and rte_errno is set.
> + */
> +#ifndef HAVE_IFLA_VXLAN_COLLECT_METADATA

Why negative(ifndef) first intead of positive(ifdef)?

> +static struct mlx5_flow_tcf_vtep*
> +flow_tcf_create_iface(struct mlx5_flow_tcf_context *tcf __rte_unused,
> +		      unsigned int ifouter __rte_unused,
> +		      uint16_t port __rte_unused,
> +		      struct rte_flow_error *error)
> +{
> +	rte_flow_error_set(error, ENOTSUP,
> +			 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> +			 "netlink: failed to create VTEP, "
> +			 "VXLAN metadat is not supported by kernel");

Typo.

> +	return NULL;
> +}
> +#else
> +static struct mlx5_flow_tcf_vtep*
> +flow_tcf_create_iface(struct mlx5_flow_tcf_context *tcf,

How about adding 'vtep'? It sounds vague - creating a general interface.
E.g., flow_tcf_create_vtep_iface()?

> +		      unsigned int ifouter,
> +		      uint16_t port, struct rte_flow_error *error)
> +{
> +	struct mlx5_flow_tcf_vtep *vtep;
> +	struct nlmsghdr *nlh;
> +	struct ifinfomsg *ifm;
> +	char name[sizeof(MLX5_VXLAN_DEVICE_PFX) + 24];
> +	alignas(struct nlmsghdr)
> +	uint8_t buf[mnl_nlmsg_size(sizeof(*ifm)) + 128 +

Use a macro for '128'. Can't know the meaning.

> +		       SZ_NLATTR_DATA_OF(sizeof(name)) +
> +		       SZ_NLATTR_NEST * 2 +
> +		       SZ_NLATTR_STRZ_OF("vxlan") +
> +		       SZ_NLATTR_DATA_OF(sizeof(uint32_t)) +
> +		       SZ_NLATTR_DATA_OF(sizeof(uint32_t)) +
> +		       SZ_NLATTR_DATA_OF(sizeof(uint16_t)) +
> +		       SZ_NLATTR_DATA_OF(sizeof(uint8_t))];
> +	struct nlattr *na_info;
> +	struct nlattr *na_vxlan;
> +	rte_be16_t vxlan_port = RTE_BE16(port);

Use rte_cpu_to_be_*() instead.

> +	int ret;
> +
> +	vtep = rte_zmalloc(__func__, sizeof(*vtep),
> +			alignof(struct mlx5_flow_tcf_vtep));
> +	if (!vtep) {
> +		rte_flow_error_set
> +			(error, ENOMEM, RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> +			 NULL, "unadble to allocate memory for VTEP desc");
> +		return NULL;
> +	}
> +	*vtep = (struct mlx5_flow_tcf_vtep){
> +			.refcnt = 0,
> +			.port = port,
> +			.created = 0,
> +			.ifouter = 0,
> +			.ifindex = 0,
> +			.local = LIST_HEAD_INITIALIZER(),
> +			.neigh = LIST_HEAD_INITIALIZER(),
> +	};
> +	memset(buf, 0, sizeof(buf));
> +	nlh = mnl_nlmsg_put_header(buf);
> +	nlh->nlmsg_type = RTM_NEWLINK;
> +	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE  | NLM_F_EXCL;
> +	ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
> +	ifm->ifi_family = AF_UNSPEC;
> +	ifm->ifi_type = 0;
> +	ifm->ifi_index = 0;
> +	ifm->ifi_flags = IFF_UP;
> +	ifm->ifi_change = 0xffffffff;
> +	snprintf(name, sizeof(name), "%s%u", MLX5_VXLAN_DEVICE_PFX, port);
> +	mnl_attr_put_strz(nlh, IFLA_IFNAME, name);
> +	na_info = mnl_attr_nest_start(nlh, IFLA_LINKINFO);
> +	assert(na_info);
> +	mnl_attr_put_strz(nlh, IFLA_INFO_KIND, "vxlan");
> +	na_vxlan = mnl_attr_nest_start(nlh, IFLA_INFO_DATA);
> +	if (ifouter)
> +		mnl_attr_put_u32(nlh, IFLA_VXLAN_LINK, ifouter);
> +	assert(na_vxlan);
> +	mnl_attr_put_u8(nlh, IFLA_VXLAN_COLLECT_METADATA, 1);
> +	mnl_attr_put_u8(nlh, IFLA_VXLAN_UDP_ZERO_CSUM6_RX, 1);
> +	mnl_attr_put_u8(nlh, IFLA_VXLAN_LEARNING, 0);
> +	mnl_attr_put_u16(nlh, IFLA_VXLAN_PORT, vxlan_port);
> +	mnl_attr_nest_end(nlh, na_vxlan);
> +	mnl_attr_nest_end(nlh, na_info);
> +	assert(sizeof(buf) >= nlh->nlmsg_len);
> +	ret = flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL);
> +	if (ret)
> +		DRV_LOG(WARNING,
> +			"netlink: VTEP %s create failure (%d)",
> +			name, rte_errno);
> +	else
> +		vtep->created = 1;

Flow of code here isn't smooth, thus could be error-prone. Most of all, I don't
like ret has multiple meanings. ret should be return value but you are using it
to store ifindex.

> +	if (ret && ifouter)
> +		ret = 0;
> +	else
> +		ret = if_nametoindex(name);

If vtep isn't created and ifouter is set, then skip init below, which
means, if vtep is created or ifouter is set, it tries to get ifindex of vtep.
But why do you want to try to call this API even if it failed to create vtep?
Let's not make code flow convoluted even though it logically works. Let's make
it straightforward.

> +	if (ret) {
> +		vtep->ifindex = ret;
> +		vtep->ifouter = ifouter;
> +		memset(buf, 0, sizeof(buf));
> +		nlh = mnl_nlmsg_put_header(buf);
> +		nlh->nlmsg_type = RTM_NEWLINK;
> +		nlh->nlmsg_flags = NLM_F_REQUEST;
> +		ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
> +		ifm->ifi_family = AF_UNSPEC;
> +		ifm->ifi_type = 0;
> +		ifm->ifi_index = vtep->ifindex;
> +		ifm->ifi_flags = IFF_UP;
> +		ifm->ifi_change = IFF_UP;
> +		ret = flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL);
> +		if (ret) {
> +			DRV_LOG(WARNING,
> +				"netlink: VTEP %s set link up failure (%d)",
> +				name, rte_errno);
> +			rte_free(vtep);
> +			rte_flow_error_set
> +				(error, -errno,
> +				 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> +				 "netlink: failed to set VTEP link up");
> +			vtep = NULL;
> +		} else {
> +			ret = mlx5_flow_tcf_init(tcf, vtep->ifindex, error);
> +			if (ret)
> +				DRV_LOG(WARNING,
> +				"VTEP %s init failure (%d)", name, rte_errno);
> +		}
> +	} else {
> +		DRV_LOG(WARNING,
> +			"VTEP %s failed to get index (%d)", name, errno);
> +		rte_flow_error_set
> +			(error, -errno,
> +			 RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> +			 !vtep->created ? "netlink: failed to create VTEP" :
> +			 "netlink: failed to retrieve VTEP ifindex");
> +			 ret = 1;

If it fails to create a vtep above, it will print out two warning messages and
one rte_flow_error message. And it even selects message to print between two?
And there's another info msg at the end even in case of failure. Do you really
want to do this even with manipulating ret to change code path?  Not a good
practice.

Usually, code path should be straightforward for sucessful path and for
errors/failures, return immediately or use 'goto' if there's need for cleanup.

Please refactor entire function.

> +	}
> +	if (ret) {
> +		flow_tcf_delete_iface(tcf, vtep);
> +		vtep = NULL;
> +	}
> +	DRV_LOG(INFO, "VTEP create (%d, %s)", vtep->port, vtep ? "OK" : "error");
> +	return vtep;
> +}
> +#endif /* HAVE_IFLA_VXLAN_COLLECT_METADATA */
> +
> +/**
> + * Create target interface index for VXLAN tunneling decapsulation.
> + * In order to share the UDP port within the other interfaces the
> + * VXLAN device created as not attached to any interface (if created).
> + *
> + * @param[in] tcf
> + *   Context object initialized by mlx5_flow_tcf_context_create().
> + * @param[in] dev_flow
> + *   Flow tcf object with tunnel structure pointer set.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + * @return
> + *   Interface index on success, zero otherwise and rte_errno is set.

Return negative errno in case of failure like others.

 *   Interface index on success, a negative errno value otherwise and rte_errno is set.

> + */
> +static unsigned int
> +flow_tcf_decap_vtep_create(struct mlx5_flow_tcf_context *tcf,
> +			   struct mlx5_flow *dev_flow,
> +			   struct rte_flow_error *error)
> +{
> +	struct mlx5_flow_tcf_vtep *vtep, *vlst;
> +	uint16_t port = dev_flow->tcf.vxlan_decap->udp_port;
> +
> +	vtep = NULL;
> +	LIST_FOREACH(vlst, &vtep_list_vxlan, next) {
> +		if (vlst->port == port) {
> +			vtep = vlst;
> +			break;
> +		}
> +	}

You just need one variable.

	struct mlx5_flow_tcf_vtep *vtep;

	LIST_FOREACH(vtep, &vtep_list_vxlan, next) {
		if (vtep->port == port)
			break;
	}

> +	if (!vtep) {
> +		vtep = flow_tcf_create_iface(tcf, 0, port, error);
> +		if (vtep)
> +			LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next);
> +	} else {
> +		if (vtep->ifouter) {
> +			rte_flow_error_set(error, -errno,
> +				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> +				"Failed to create decap VTEP, attached "
> +				"device with the same UDP port exists");
> +				vtep = NULL;

Making vtep null to skip the following code? Please merge the two same if/else
and make the code path strightforward. And which errno do you expect here?
Should it be set EEXIST instead?

> +		}
> +	}
> +	if (vtep) {
> +		vtep->refcnt++;
> +		assert(vtep->ifindex);
> +		return vtep->ifindex;
> +	} else {
> +		return 0;
> +	}

Why repeating same if/else?


This is my suggestion but if you take my suggestion to have
flow_tcf_[create|get|release]_iface(), this will get much simpler.

{
	struct mlx5_flow_tcf_vtep *vtep;
	uint16_t port = dev_flow->tcf.vxlan_decap->udp_port;

	LIST_FOREACH(vtep, &vtep_list_vxlan, next) {
		if (vtep->port == port)
			break;
	}
	if (vtep && vtep->ifouter)
		return rte_flow_error_set(... EEXIST ...);
	else if (vtep) {
		++vtep->refcnt;
	} else {
		vtep = flow_tcf_create_iface(tcf, 0, port, error);
		if (!vtep)
			return rte_flow_error_set(...);
		LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next);
	}
	assert(vtep->ifindex);
	return vtep->ifindex;
}


> +}
> +
> +/**
> + * Creates target interface index for VXLAN tunneling encapsulation.
> + *
> + * @param[in] tcf
> + *   Context object initialized by mlx5_flow_tcf_context_create().
> + * @param[in] ifouter
> + *   Network interface index to attach VXLAN encap device to.
> + * @param[in] dev_flow
> + *   Flow tcf object with tunnel structure pointer set.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + * @return
> + *   Interface index on success, zero otherwise and rte_errno is set.
> + */
> +static unsigned int
> +flow_tcf_encap_vtep_create(struct mlx5_flow_tcf_context *tcf,
> +			    unsigned int ifouter,
> +			    struct mlx5_flow *dev_flow __rte_unused,
> +			    struct rte_flow_error *error)
> +{
> +	static uint16_t encap_port = MLX5_VXLAN_PORT_RANGE_MIN - 1;
> +	struct mlx5_flow_tcf_vtep *vtep, *vlst;
> +
> +	assert(ifouter);
> +	/* Look whether the attached VTEP for encap is created. */
> +	vtep = NULL;
> +	LIST_FOREACH(vlst, &vtep_list_vxlan, next) {
> +		if (vlst->ifouter == ifouter) {
> +			vtep = vlst;
> +			break;
> +		}
> +	}

Same here.

> +	if (!vtep) {
> +		uint16_t pcnt;
> +
> +		/* Not found, we should create the new attached VTEP. */
> +/*
> + * TODO: not implemented yet
> + * flow_tcf_encap_iface_cleanup(tcf, ifouter);
> + * flow_tcf_encap_local_cleanup(tcf, ifouter);
> + * flow_tcf_encap_neigh_cleanup(tcf, ifouter);
> + */

Personal note is not appropriate even though it is removed in the following
patch.

> +		for (pcnt = 0; pcnt <= (MLX5_VXLAN_PORT_RANGE_MAX
> +				     - MLX5_VXLAN_PORT_RANGE_MIN); pcnt++) {
> +			encap_port++;
> +			/* Wraparound the UDP port index. */
> +			if (encap_port < MLX5_VXLAN_PORT_RANGE_MIN ||
> +			    encap_port > MLX5_VXLAN_PORT_RANGE_MAX)
> +				encap_port = MLX5_VXLAN_PORT_RANGE_MIN;
> +			/* Check whether UDP port is in already in use. */
> +			vtep = NULL;
> +			LIST_FOREACH(vlst, &vtep_list_vxlan, next) {
> +				if (vlst->port == encap_port) {
> +					vtep = vlst;
> +					break;
> +				}
> +			}

If you want to find out an empty port number, you can use rte_bitmap instead of
repeating searching the entire list for all possible port numbers.

> +			if (vtep) {
> +				vtep = NULL;
> +				continue;
> +			}
> +			vtep = flow_tcf_create_iface(tcf, ifouter,
> +						     encap_port, error);
> +			if (vtep) {
> +				LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next);
> +				break;
> +			}
> +			if (rte_errno != EEXIST)
> +				break;
> +		}
> +	}
> +	if (!vtep)
> +		return 0;
> +	vtep->refcnt++;
> +	assert(vtep->ifindex);
> +	return vtep->ifindex;

Please refactor this func according to what I suggested for
flow_tcf_decap_vtep_create() and flow_tcf_delete_iface().

> +}
> +
> +/**
> + * Creates target interface index for tunneling of any type.
> + *
> + * @param[in] tcf
> + *   Context object initialized by mlx5_flow_tcf_context_create().
> + * @param[in] ifouter
> + *   Network interface index to attach VXLAN encap device to.
> + * @param[in] dev_flow
> + *   Flow tcf object with tunnel structure pointer set.
> + * @param[out] error
> + *   Perform verbose error reporting if not NULL.
> + * @return
> + *   Interface index on success, zero otherwise and rte_errno is set.

 *   Interface index on success, a negative errno value otherwise and
 *   rte_errno is set.

> + */
> +static unsigned int
> +flow_tcf_tunnel_vtep_create(struct mlx5_flow_tcf_context *tcf,
> +			    unsigned int ifouter,
> +			    struct mlx5_flow *dev_flow,
> +			    struct rte_flow_error *error)
> +{
> +	unsigned int ret;
> +
> +	assert(dev_flow->tcf.tunnel);
> +	pthread_mutex_lock(&vtep_list_mutex);
> +	switch (dev_flow->tcf.tunnel->type) {
> +	case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP:
> +		ret = flow_tcf_encap_vtep_create(tcf, ifouter,
> +						 dev_flow, error);
> +		break;
> +	case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP:
> +		ret = flow_tcf_decap_vtep_create(tcf, dev_flow, error);
> +		break;
> +	default:
> +		rte_flow_error_set(error, ENOTSUP,
> +				RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> +				"unsupported tunnel type");
> +		ret = 0;
> +		break;
> +	}
> +	pthread_mutex_unlock(&vtep_list_mutex);
> +	return ret;
> +}
> +
> +/**
> + * Deletes tunneling interface by UDP port.
> + *
> + * @param[in] tcf
> + *   Context object initialized by mlx5_flow_tcf_context_create().
> + * @param[in] ifindex
> + *   Network interface index of VXLAN device.
> + * @param[in] dev_flow
> + *   Flow tcf object with tunnel structure pointer set.
> + */
> +static void
> +flow_tcf_tunnel_vtep_delete(struct mlx5_flow_tcf_context *tcf,
> +			    unsigned int ifindex,
> +			    struct mlx5_flow *dev_flow)
> +{
> +	struct mlx5_flow_tcf_vtep *vtep, *vlst;
> +
> +	assert(dev_flow->tcf.tunnel);
> +	pthread_mutex_lock(&vtep_list_mutex);
> +	vtep = NULL;
> +	LIST_FOREACH(vlst, &vtep_list_vxlan, next) {
> +		if (vlst->ifindex == ifindex) {
> +			vtep = vlst;
> +			break;
> +		}
> +	}

It is weird. You just can have vtep pointer in the dev_flow->tcf.tunnel instead
of ifindex_tun which is same as vtep->ifindex like the assertion below. Then,
this lookup can be skipped.

> +	if (!vtep) {
> +		DRV_LOG(WARNING, "No VTEP device found in the list");
> +		goto exit;
> +	}
> +	switch (dev_flow->tcf.tunnel->type) {
> +	case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP:
> +		break;
> +	case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP:
> +/*
> + * TODO: Remove the encap ancillary rules first.
> + * flow_tcf_encap_neigh(tcf, vtep, dev_flow, false, NULL);
> + * flow_tcf_encap_local(tcf, vtep, dev_flow, false, NULL);
> + */

Is it a personal note? Please remove.

> +		break;
> +	default:
> +		assert(false);
> +		DRV_LOG(WARNING, "Unsupported tunnel type");
> +		break;
> +	}
> +	assert(dev_flow->tcf.tunnel->ifindex_tun == vtep->ifindex);
> +	assert(vtep->refcnt);
> +	if (!vtep->refcnt || !--vtep->refcnt) {
> +		LIST_REMOVE(vtep, next);
> +		flow_tcf_delete_iface(tcf, vtep);
> +	}
> +exit:
> +	pthread_mutex_unlock(&vtep_list_mutex);
> +}
> +
>  /**
>   * Apply flow to E-Switch by sending Netlink message.
>   *
> @@ -3461,18 +3887,61 @@ struct pedit_parser {
>  	       struct rte_flow_error *error)
>  {
>  	struct priv *priv = dev->data->dev_private;
> -	struct mlx5_flow_tcf_context *nl = priv->tcf_context;
> +	struct mlx5_flow_tcf_context *tcf = priv->tcf_context;
>  	struct mlx5_flow *dev_flow;
>  	struct nlmsghdr *nlh;
> +	int ret;
>  
>  	dev_flow = LIST_FIRST(&flow->dev_flows);
>  	/* E-Switch flow can't be expanded. */
>  	assert(!LIST_NEXT(dev_flow, next));
> +	if (dev_flow->tcf.applied)
> +		return 0;
>  	nlh = dev_flow->tcf.nlh;
>  	nlh->nlmsg_type = RTM_NEWTFILTER;
>  	nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE | NLM_F_EXCL;
> -	if (!flow_tcf_nl_ack(nl, nlh, 0, NULL, NULL))
> +	if (dev_flow->tcf.tunnel) {
> +		/*
> +		 * Replace the interface index, target for
> +		 * encapsulation, source for decapsulation.
> +		 */
> +		assert(!dev_flow->tcf.tunnel->ifindex_tun);
> +		assert(dev_flow->tcf.tunnel->ifindex_ptr);
> +		/* Create actual VTEP device when rule is being applied. */
> +		dev_flow->tcf.tunnel->ifindex_tun
> +			= flow_tcf_tunnel_vtep_create(tcf,
> +					*dev_flow->tcf.tunnel->ifindex_ptr,
> +					dev_flow, error);
> +			DRV_LOG(INFO, "Replace ifindex: %d->%d",
> +				dev_flow->tcf.tunnel->ifindex_tun,
> +				*dev_flow->tcf.tunnel->ifindex_ptr);
> +		if (!dev_flow->tcf.tunnel->ifindex_tun)
> +			return -rte_errno;
> +		dev_flow->tcf.tunnel->ifindex_org
> +			= *dev_flow->tcf.tunnel->ifindex_ptr;
> +		*dev_flow->tcf.tunnel->ifindex_ptr
> +			= dev_flow->tcf.tunnel->ifindex_tun;
> +	}
> +	ret = flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL);
> +	if (dev_flow->tcf.tunnel) {
> +		DRV_LOG(INFO, "Restore ifindex: %d->%d",
> +				dev_flow->tcf.tunnel->ifindex_org,
> +				*dev_flow->tcf.tunnel->ifindex_ptr);
> +		*dev_flow->tcf.tunnel->ifindex_ptr
> +			= dev_flow->tcf.tunnel->ifindex_org;
> +		dev_flow->tcf.tunnel->ifindex_org = 0;

ifindex_org looks a temporary storage in this code. And this kind of hassle
(replace/restore) is there because you took the ifindex from the netlink
message. Why don't you have just

struct mlx5_flow_tcf_tunnel_hdr {
	uint32_t type; /**< Tunnel action type. */
	unsigned int ifindex; /**< Original dst/src interface */
	struct mlx5_flow_tcf_vtep *vtep; /**< Tunnel endpoint device. */
	unsigned int *nlmsg_ifindex_ptr; /**< ifindex ptr in Netlink message. */
};

and don't change ifindex?

Thanks,
Yongseok

> +	}
> +	if (!ret) {
> +		dev_flow->tcf.applied = 1;
>  		return 0;
> +	}
> +	DRV_LOG(WARNING, "netlink: failed to create TC rule (%d)", rte_errno);
> +	if (dev_flow->tcf.tunnel->ifindex_tun) {
> +		flow_tcf_tunnel_vtep_delete(tcf,
> +					    dev_flow->tcf.tunnel->ifindex_tun,
> +					    dev_flow);
> +		dev_flow->tcf.tunnel->ifindex_tun = 0;
> +	}
>  	return rte_flow_error_set(error, rte_errno,
>  				  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
>  				  "netlink: failed to create TC flow rule");
> @@ -3490,7 +3959,7 @@ struct pedit_parser {
>  flow_tcf_remove(struct rte_eth_dev *dev, struct rte_flow *flow)
>  {
>  	struct priv *priv = dev->data->dev_private;
> -	struct mlx5_flow_tcf_context *nl = priv->tcf_context;
> +	struct mlx5_flow_tcf_context *tcf = priv->tcf_context;
>  	struct mlx5_flow *dev_flow;
>  	struct nlmsghdr *nlh;
>  
> @@ -3501,10 +3970,36 @@ struct pedit_parser {
>  		return;
>  	/* E-Switch flow can't be expanded. */
>  	assert(!LIST_NEXT(dev_flow, next));
> +	if (!dev_flow->tcf.applied)
> +		return;
> +	if (dev_flow->tcf.tunnel) {
> +		/*
> +		 * Replace the interface index, target for
> +		 * encapsulation, source for decapsulation.
> +		 */
> +		assert(dev_flow->tcf.tunnel->ifindex_tun);
> +		assert(dev_flow->tcf.tunnel->ifindex_ptr);
> +		dev_flow->tcf.tunnel->ifindex_org
> +			= *dev_flow->tcf.tunnel->ifindex_ptr;
> +		*dev_flow->tcf.tunnel->ifindex_ptr
> +			= dev_flow->tcf.tunnel->ifindex_tun;
> +	}
>  	nlh = dev_flow->tcf.nlh;
>  	nlh->nlmsg_type = RTM_DELTFILTER;
>  	nlh->nlmsg_flags = NLM_F_REQUEST;
> -	flow_tcf_nl_ack(nl, nlh, 0, NULL, NULL);
> +	flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL);
> +	if (dev_flow->tcf.tunnel) {
> +		*dev_flow->tcf.tunnel->ifindex_ptr
> +			= dev_flow->tcf.tunnel->ifindex_org;
> +		dev_flow->tcf.tunnel->ifindex_org = 0;
> +		if (dev_flow->tcf.tunnel->ifindex_tun) {
> +			flow_tcf_tunnel_vtep_delete(tcf,
> +					dev_flow->tcf.tunnel->ifindex_tun,
> +					dev_flow);
> +			dev_flow->tcf.tunnel->ifindex_tun = 0;
> +		}
> +	}
> +	dev_flow->tcf.applied = 0;
>  }
>  
>  /**
> 

  reply	other threads:[~2018-10-25  0:28 UTC|newest]

Thread overview: 110+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-02  6:30 [dpdk-dev] [PATCH 1/5] net/mlx5: add VXLAN encap/decap support for e-switch Slava Ovsiienko
2018-10-02  6:30 ` [dpdk-dev] [PATCH 2/5] net/mlx5: e-switch VXLAN netlink routines update Slava Ovsiienko
2018-10-02  6:30 ` [dpdk-dev] [PATCH 3/5] net/mlx5: e-switch VXLAN flow validation routine Slava Ovsiienko
2018-10-02  6:30 ` [dpdk-dev] [PATCH 4/5] net/mlx5: e-switch VXLAN flow translation routine Slava Ovsiienko
2018-10-02  6:30 ` [dpdk-dev] [PATCH 5/5] net/mlx5: e-switch VXLAN tunnel devices management Slava Ovsiienko
2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 0/7] net/mlx5: e-switch VXLAN encap/decap hardware offload Viacheslav Ovsiienko
2018-10-15 14:13   ` [dpdk-dev] [PATCH v2 1/7] net/mlx5: e-switch VXLAN configuration and definitions Viacheslav Ovsiienko
2018-10-23 10:01     ` Yongseok Koh
2018-10-25 12:50       ` Slava Ovsiienko
2018-10-25 23:33         ` Yongseok Koh
2018-10-15 14:13   ` [dpdk-dev] [PATCH v2 2/7] net/mlx5: e-switch VXLAN flow validation routine Viacheslav Ovsiienko
2018-10-23 10:04     ` Yongseok Koh
2018-10-25 13:53       ` Slava Ovsiienko
2018-10-26  3:07         ` Yongseok Koh
2018-10-26  8:39           ` Slava Ovsiienko
2018-10-26 21:56             ` Yongseok Koh
2018-10-29  9:33               ` Slava Ovsiienko
2018-10-29 18:26                 ` Yongseok Koh
2018-10-15 14:13   ` [dpdk-dev] [PATCH v2 3/7] net/mlx5: e-switch VXLAN flow translation routine Viacheslav Ovsiienko
2018-10-23 10:06     ` Yongseok Koh
2018-10-25 14:37       ` Slava Ovsiienko
2018-10-26  4:22         ` Yongseok Koh
2018-10-26  9:06           ` Slava Ovsiienko
2018-10-26 22:10             ` Yongseok Koh
2018-10-15 14:13   ` [dpdk-dev] [PATCH v2 4/7] net/mlx5: e-switch VXLAN netlink routines update Viacheslav Ovsiienko
2018-10-23 10:07     ` Yongseok Koh
2018-10-15 14:13   ` [dpdk-dev] [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices management Viacheslav Ovsiienko
2018-10-25  0:28     ` Yongseok Koh [this message]
2018-10-25 20:21       ` Slava Ovsiienko
2018-10-26  6:25         ` Yongseok Koh
2018-10-26  9:35           ` Slava Ovsiienko
2018-10-26 22:42             ` Yongseok Koh
2018-10-29 11:53               ` Slava Ovsiienko
2018-10-29 18:42                 ` Yongseok Koh
2018-10-15 14:13   ` [dpdk-dev] [PATCH v2 6/7] net/mlx5: e-switch VXLAN encapsulation rules management Viacheslav Ovsiienko
2018-10-25  0:33     ` Yongseok Koh
2018-10-15 14:13   ` [dpdk-dev] [PATCH v2 7/7] net/mlx5: e-switch VXLAN rule cleanup routines Viacheslav Ovsiienko
2018-10-25  0:36     ` Yongseok Koh
2018-10-25 20:32       ` Slava Ovsiienko
2018-10-26  6:30         ` Yongseok Koh
2018-11-01 12:19   ` [dpdk-dev] [PATCH v3 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Slava Ovsiienko
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 01/13] net/mlx5: prepare makefile for adding e-switch VXLAN Slava Ovsiienko
2018-11-01 20:33       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 02/13] net/mlx5: prepare meson.build " Slava Ovsiienko
2018-11-01 20:33       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 03/13] net/mlx5: add necessary definitions for " Slava Ovsiienko
2018-11-01 20:35       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 04/13] net/mlx5: add necessary structures " Slava Ovsiienko
2018-11-01 20:36       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 05/13] net/mlx5: swap items/actions validations for e-switch rules Slava Ovsiienko
2018-11-01 20:37       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 06/13] net/mlx5: add e-switch VXLAN support to validation routine Slava Ovsiienko
2018-11-01 20:49       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 07/13] net/mlx5: add VXLAN support to flow prepare routine Slava Ovsiienko
2018-11-01 21:03       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 08/13] net/mlx5: add VXLAN support to flow translate routine Slava Ovsiienko
2018-11-01 21:18       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 09/13] net/mlx5: e-switch VXLAN netlink routines update Slava Ovsiienko
2018-11-01 21:21       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 10/13] net/mlx5: fix e-switch Flow counter deletion Slava Ovsiienko
2018-11-01 22:00       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 11/13] net/mlx5: add e-switch VXLAN tunnel devices management Slava Ovsiienko
2018-11-01 23:59       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 12/13] net/mlx5: add e-switch VXLAN encapsulation rules Slava Ovsiienko
2018-11-02  0:01       ` Yongseok Koh
2018-11-01 12:19     ` [dpdk-dev] [PATCH v3 13/13] net/mlx5: add e-switch VXLAN rule cleanup routines Slava Ovsiienko
2018-11-02  0:01       ` Yongseok Koh
2018-11-01 20:32     ` [dpdk-dev] [PATCH v3 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Yongseok Koh
2018-11-02 17:53     ` [dpdk-dev] [PATCH v4 " Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 01/13] net/mlx5: prepare makefile for adding E-Switch VXLAN Slava Ovsiienko
2018-11-03  6:18         ` [dpdk-dev] [PATCH v5 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 01/13] net/mlx5: prepare makefile for adding E-Switch VXLAN Slava Ovsiienko
2018-11-12 20:01             ` [dpdk-dev] [PATCH 0/4] net/mlx5: prepare to add E-switch rule flags check Slava Ovsiienko
2018-11-12 20:01               ` [dpdk-dev] [PATCH 1/4] net/mlx5: prepare Netlink communication routine to fix Slava Ovsiienko
2018-11-13 13:21                 ` Shahaf Shuler
2018-11-12 20:01               ` [dpdk-dev] [PATCH 2/4] net/mlx5: fix Netlink communication routine Slava Ovsiienko
2018-11-13 13:21                 ` Shahaf Shuler
2018-11-14 12:57                   ` Slava Ovsiienko
2018-11-12 20:01               ` [dpdk-dev] [PATCH 3/4] net/mlx5: prepare to add E-switch rule flags check Slava Ovsiienko
2018-11-12 20:01               ` [dpdk-dev] [PATCH 4/4] net/mlx5: add E-switch rule hardware offload flag check Slava Ovsiienko
2018-11-13 13:21               ` [dpdk-dev] [PATCH 0/4] net/mlx5: prepare to add E-switch rule flags check Shahaf Shuler
2018-11-14 14:56                 ` Shahaf Shuler
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 02/13] net/mlx5: prepare meson.build for adding E-Switch VXLAN Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 03/13] net/mlx5: add necessary definitions for " Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 04/13] net/mlx5: add necessary structures " Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 05/13] net/mlx5: swap items/actions validations for E-Switch rules Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 06/13] net/mlx5: add E-Switch VXLAN support to validation routine Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 07/13] net/mlx5: add VXLAN support to flow prepare routine Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 08/13] net/mlx5: add VXLAN support to flow translate routine Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 09/13] net/mlx5: update E-Switch VXLAN netlink routines Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 10/13] net/mlx5: fix E-Switch Flow counter deletion Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 11/13] net/mlx5: add E-switch VXLAN tunnel devices management Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 12/13] net/mlx5: add E-Switch VXLAN encapsulation rules Slava Ovsiienko
2018-11-03  6:18           ` [dpdk-dev] [PATCH v5 13/13] net/mlx5: add E-switch VXLAN rule cleanup routines Slava Ovsiienko
2018-11-04  6:48           ` [dpdk-dev] [PATCH v5 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Shahaf Shuler
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 02/13] net/mlx5: prepare meson.build for adding E-Switch VXLAN Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 03/13] net/mlx5: add necessary definitions for " Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 04/13] net/mlx5: add necessary structures " Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 05/13] net/mlx5: swap items/actions validations for E-Switch rules Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 07/13] net/mlx5: add VXLAN support to flow prepare routine Slava Ovsiienko
2018-11-02 21:38         ` Yongseok Koh
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 06/13] net/mlx5: add E-Switch VXLAN support to validation routine Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 08/13] net/mlx5: add VXLAN support to flow translate routine Slava Ovsiienko
2018-11-02 21:53         ` Yongseok Koh
2018-11-02 23:29           ` Yongseok Koh
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 09/13] net/mlx5: update E-Switch VXLAN netlink routines Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 10/13] net/mlx5: fix E-Switch Flow counter deletion Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 11/13] net/mlx5: add E-switch VXLAN tunnel devices management Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 12/13] net/mlx5: add E-Switch VXLAN encapsulation rules Slava Ovsiienko
2018-11-02 17:53       ` [dpdk-dev] [PATCH v4 13/13] net/mlx5: add E-switch VXLAN rule cleanup routines Slava Ovsiienko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181025002759.GA26874@mtidpdk.mti.labs.mlnx \
    --to=yskoh@mellanox.com \
    --cc=dev@dpdk.org \
    --cc=shahafs@mellanox.com \
    --cc=viacheslavo@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ https://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git