From: Yongseok Koh <yskoh@mellanox.com>
To: Slava Ovsiienko <viacheslavo@mellanox.com>
Cc: Shahaf Shuler <shahafs@mellanox.com>, "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices management
Date: Fri, 26 Oct 2018 06:25:48 +0000 [thread overview]
Message-ID: <20181026062307.GD6434@mtidpdk.mti.labs.mlnx> (raw)
In-Reply-To: <AM4PR05MB326594413216A252B5194833D2F70@AM4PR05MB3265.eurprd05.prod.outlook.com>
On Thu, Oct 25, 2018 at 01:21:12PM -0700, Slava Ovsiienko wrote:
> > -----Original Message-----
> > From: Yongseok Koh
> > Sent: Thursday, October 25, 2018 3:28
> > To: Slava Ovsiienko <viacheslavo@mellanox.com>
> > Cc: Shahaf Shuler <shahafs@mellanox.com>; dev@dpdk.org
> > Subject: Re: [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices
> > management
> >
> > On Mon, Oct 15, 2018 at 02:13:33PM +0000, Viacheslav Ovsiienko wrote:
> > > VXLAN interfaces are dynamically created for each local UDP port of
> > > outer networks and then used as targets for TC "flower" filters in
> > > order to perform encapsulation. These VXLAN interfaces are
> > > system-wide, the only one device with given UDP port can exist in the
> > > system (the attempt of creating another device with the same UDP local
> > > port returns EEXIST), so PMD should support the shared device
> > > instances database for PMD instances. These VXLAN implicitly created
> > > devices are called VTEPs (Virtual Tunnel End Points).
> > >
> > > Creation of the VTEP occurs at the moment of rule applying. The link
> > > is set up, root ingress qdisc is also initialized.
> > >
> > > Encapsulation VTEPs are created on per port basis, the single VTEP is
> > > attached to the outer interface and is shared for all encapsulation
> > > rules on this interface. The source UDP port is automatically selected
> > > in range 30000-60000.
> > >
> > > For decapsulaton one VTEP is created per every unique UDP local port
> > > to accept tunnel traffic. The name of created VTEP consists of prefix
> > > "vmlx_" and the number of UDP port in decimal digits without leading
> > > zeros (vmlx_4789). The VTEP can be preliminary created in the system
> > > before the launching
> > > application, it allows to share UDP ports between primary
> > > and secondary processes.
> > >
> > > Suggested-by: Adrien Mazarguil <adrien.mazarguil@6wind.com>
> > > Signed-off-by: Viacheslav Ovsiienko <viacheslavo@mellanox.com>
> > > ---
> > > drivers/net/mlx5/mlx5_flow_tcf.c | 503
> > > ++++++++++++++++++++++++++++++++++++++-
> > > 1 file changed, 499 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/mlx5/mlx5_flow_tcf.c
> > > b/drivers/net/mlx5/mlx5_flow_tcf.c
> > > index d6840d5..efa9c3b 100644
> > > --- a/drivers/net/mlx5/mlx5_flow_tcf.c
> > > +++ b/drivers/net/mlx5/mlx5_flow_tcf.c
> > > @@ -3443,6 +3443,432 @@ struct pedit_parser {
> > > return -err;
> > > }
> > >
> > > +/* VTEP device list is shared between PMD port instances. */ static
> > > +LIST_HEAD(, mlx5_flow_tcf_vtep)
> > > + vtep_list_vxlan = LIST_HEAD_INITIALIZER(); static
> > pthread_mutex_t
> > > +vtep_list_mutex = PTHREAD_MUTEX_INITIALIZER;
> >
> > What's the reason for choosing pthread_mutex instead of rte_*_lock?
>
> The sharing this database for secondary processes?
The static variable isn't shared with sec proc. But you can leave it as is.
> > > +
> > > +/**
> > > + * Deletes VTEP network device.
> > > + *
> > > + * @param[in] tcf
> > > + * Context object initialized by mlx5_flow_tcf_context_create().
> > > + * @param[in] vtep
> > > + * Object represinting the network device to delete. Memory
> > > + * allocated for this object is freed by routine.
> > > + */
> > > +static void
> > > +flow_tcf_delete_iface(struct mlx6_flow_tcf_context *tcf,
> > > + struct mlx5_flow_tcf_vtep *vtep) {
> > > + struct nlmsghdr *nlh;
> > > + struct ifinfomsg *ifm;
> > > + alignas(struct nlmsghdr)
> > > + uint8_t buf[mnl_nlmsg_size(MNL_ALIGN(sizeof(*ifm))) + 8];
> > > + int ret;
> > > +
> > > + assert(!vtep->refcnt);
> > > + if (vtep->created && vtep->ifindex) {
> >
> > First of all vtep->created seems of no use. It is introduced to select the error
> > message in flow_tcf_create_iface(). I don't see any necessity to distinguish
> > between 'vtep is allocated by rte_malloc()' and 'vtep is created in kernel'.
>
> created flag indicates the iface is created by our code.
> The VXLAN decap devices must have the specified UDP port, we can not create
> multiple VXLAN devices with the same UDP port - EEXIST is returned. So, we have
> to share device. One option is create device before DPDK application launch and use
> these pre-created devices. Inthis case created flag is not set and VXLAN device
> is not reinitialized, and not deleted.
I can't see any code to use pre-created device (created even before dpdk app
launch). Your code just tries to create 'vmlx_xxxx'. Even from your comment in
[7/7] patch, PMD will cleanup any leftovers (existing vtep devices) on
initialization. Your comment sounds conflicting and confusing.
> > And why do you need to check vtep->ifindex as well? If vtep is created in
> > kernel and its ifindex isn't set, that should be an error which had to be hanled
> > in flow_tcf_create_iface(). Such a vtep shouldn't exist.
> Yes, if we did not get ifindex of device - vtep is not created, error returned.
> We just can not operate w/o ifindex.
I know ifindex is needed but my question was checking vtep->ifindex here looked
redundant/unnecessary. But as you agreed on having create/get/release_iface(),
it doesn't matter much.
> > Also, the refcnt management is a bit strange. Please put an abstraction by
> > adding create_iface(), get_iface() and release_iface(). In the get_ifce(),
> > vtep->refcnt should be incremented. And in the release_iface(), it
> > vtep->decrease the
> OK. Good proposal. I'll refactor the code.
>
> > refcnt and if it reaches to zero, the iface can be removed. create_iface() will
> > set the refcnt to 1. And if you refer to mlx5_hrxq_get(), it even does
> > searching the list not by repeating the same lookup code here and there.
> > That will make your code much simpler.
> >
> > > + DRV_LOG(INFO, "VTEP delete (%d)", vtep->ifindex);
> > > + nlh = mnl_nlmsg_put_header(buf);
> > > + nlh->nlmsg_type = RTM_DELLINK;
> > > + nlh->nlmsg_flags = NLM_F_REQUEST;
> > > + ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
> > > + ifm->ifi_family = AF_UNSPEC;
> > > + ifm->ifi_index = vtep->ifindex;
> > > + ret = flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL);
> > > + if (ret)
> > > + DRV_LOG(WARNING, "netlink: error deleting VXLAN
> > "
> > > + "encap/decap ifindex %u",
> > > + ifm->ifi_index);
> > > + }
> > > + rte_free(vtep);
> > > +}
> > > +
> > > +/**
> > > + * Creates VTEP network device.
> > > + *
> > > + * @param[in] tcf
> > > + * Context object initialized by mlx5_flow_tcf_context_create().
> > > + * @param[in] ifouter
> > > + * Outer interface to attach new-created VXLAN device
> > > + * If zero the VXLAN device will not be attached to any device.
> > > + * @param[in] port
> > > + * UDP port of created VTEP device.
> > > + * @param[out] error
> > > + * Perform verbose error reporting if not NULL.
> > > + *
> > > + * @return
> > > + * Pointer to created device structure on success, NULL otherwise
> > > + * and rte_errno is set.
> > > + */
> > > +#ifndef HAVE_IFLA_VXLAN_COLLECT_METADATA
> >
> > Why negative(ifndef) first intead of positive(ifdef)?
> Hm. Did I miss the rule. Positive #ifdef first? OK.
No concrete rule but if there's no specific reason, it would be better to start
from ifdef.
> > > +static struct mlx5_flow_tcf_vtep*
> > > +flow_tcf_create_iface(struct mlx5_flow_tcf_context *tcf __rte_unused,
> > > + unsigned int ifouter __rte_unused,
> > > + uint16_t port __rte_unused,
> > > + struct rte_flow_error *error) {
> > > + rte_flow_error_set(error, ENOTSUP,
> > > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> > > + "netlink: failed to create VTEP, "
> > > + "VXLAN metadat is not supported by kernel");
> >
> > Typo.
>
> OK. "metadata are not supported".
> >
> > > + return NULL;
> > > +}
> > > +#else
> > > +static struct mlx5_flow_tcf_vtep*
> > > +flow_tcf_create_iface(struct mlx5_flow_tcf_context *tcf,
> >
> > How about adding 'vtep'? It sounds vague - creating a general interface.
> > E.g., flow_tcf_create_vtep_iface()?
>
> OK.
>
> >
> > > + unsigned int ifouter,
> > > + uint16_t port, struct rte_flow_error *error) {
> > > + struct mlx5_flow_tcf_vtep *vtep;
> > > + struct nlmsghdr *nlh;
> > > + struct ifinfomsg *ifm;
> > > + char name[sizeof(MLX5_VXLAN_DEVICE_PFX) + 24];
> > > + alignas(struct nlmsghdr)
> > > + uint8_t buf[mnl_nlmsg_size(sizeof(*ifm)) + 128 +
> >
> > Use a macro for '128'. Can't know the meaning.
> OK. I think we should calculate the buffer size explicitly.
>
> >
> > > + SZ_NLATTR_DATA_OF(sizeof(name)) +
> > > + SZ_NLATTR_NEST * 2 +
> > > + SZ_NLATTR_STRZ_OF("vxlan") +
> > > + SZ_NLATTR_DATA_OF(sizeof(uint32_t)) +
> > > + SZ_NLATTR_DATA_OF(sizeof(uint32_t)) +
> > > + SZ_NLATTR_DATA_OF(sizeof(uint16_t)) +
> > > + SZ_NLATTR_DATA_OF(sizeof(uint8_t))];
> > > + struct nlattr *na_info;
> > > + struct nlattr *na_vxlan;
> > > + rte_be16_t vxlan_port = RTE_BE16(port);
> >
> > Use rte_cpu_to_be_*() instead.
>
> Yes, I'll recheck the whole code for this issue.
>
> >
> > > + int ret;
> > > +
> > > + vtep = rte_zmalloc(__func__, sizeof(*vtep),
> > > + alignof(struct mlx5_flow_tcf_vtep));
> > > + if (!vtep) {
> > > + rte_flow_error_set
> > > + (error, ENOMEM,
> > RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > > + NULL, "unadble to allocate memory for VTEP desc");
> > > + return NULL;
> > > + }
> > > + *vtep = (struct mlx5_flow_tcf_vtep){
> > > + .refcnt = 0,
> > > + .port = port,
> > > + .created = 0,
> > > + .ifouter = 0,
> > > + .ifindex = 0,
> > > + .local = LIST_HEAD_INITIALIZER(),
> > > + .neigh = LIST_HEAD_INITIALIZER(),
> > > + };
> > > + memset(buf, 0, sizeof(buf));
> > > + nlh = mnl_nlmsg_put_header(buf);
> > > + nlh->nlmsg_type = RTM_NEWLINK;
> > > + nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
> > NLM_F_EXCL;
> > > + ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
> > > + ifm->ifi_family = AF_UNSPEC;
> > > + ifm->ifi_type = 0;
> > > + ifm->ifi_index = 0;
> > > + ifm->ifi_flags = IFF_UP;
> > > + ifm->ifi_change = 0xffffffff;
> > > + snprintf(name, sizeof(name), "%s%u", MLX5_VXLAN_DEVICE_PFX,
> > port);
> > > + mnl_attr_put_strz(nlh, IFLA_IFNAME, name);
> > > + na_info = mnl_attr_nest_start(nlh, IFLA_LINKINFO);
> > > + assert(na_info);
> > > + mnl_attr_put_strz(nlh, IFLA_INFO_KIND, "vxlan");
> > > + na_vxlan = mnl_attr_nest_start(nlh, IFLA_INFO_DATA);
> > > + if (ifouter)
> > > + mnl_attr_put_u32(nlh, IFLA_VXLAN_LINK, ifouter);
> > > + assert(na_vxlan);
> > > + mnl_attr_put_u8(nlh, IFLA_VXLAN_COLLECT_METADATA, 1);
> > > + mnl_attr_put_u8(nlh, IFLA_VXLAN_UDP_ZERO_CSUM6_RX, 1);
> > > + mnl_attr_put_u8(nlh, IFLA_VXLAN_LEARNING, 0);
> > > + mnl_attr_put_u16(nlh, IFLA_VXLAN_PORT, vxlan_port);
> > > + mnl_attr_nest_end(nlh, na_vxlan);
> > > + mnl_attr_nest_end(nlh, na_info);
> > > + assert(sizeof(buf) >= nlh->nlmsg_len);
> > > + ret = flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL);
> > > + if (ret)
> > > + DRV_LOG(WARNING,
> > > + "netlink: VTEP %s create failure (%d)",
> > > + name, rte_errno);
> > > + else
> > > + vtep->created = 1;
> >
> > Flow of code here isn't smooth, thus could be error-prone. Most of all, I
> > don't like ret has multiple meanings. ret should be return value but you are
> > using it to store ifindex.
> >
> > > + if (ret && ifouter)
> > > + ret = 0;
> > > + else
> > > + ret = if_nametoindex(name);
> >
> > If vtep isn't created and ifouter is set, then skip init below, which means, if
>
> ifouter is set for VXLAN encap devices. They should be attached to ifouter
> and can not be shared. So, if ifouter I set - we do not use the precreated/existing
> VXLAN devices. We have to create our own not shared device.
In your code (flow_tcf_encap_vtep_create()), it is shared by multiple flows. Do
you mean it isn't shared between different outer ifaces? If so, that's for sure.
> > vtep is created or ifouter is set, it tries to get ifindex of vtep.
> > But why do you want to try to call this API even if it failed to create vtep?
> > Let's not make code flow convoluted even though it logically works. Let's
> > make it straightforward.
> >
> > > + if (ret) {
> > > + vtep->ifindex = ret;
> > > + vtep->ifouter = ifouter;
> > > + memset(buf, 0, sizeof(buf));
> > > + nlh = mnl_nlmsg_put_header(buf);
> > > + nlh->nlmsg_type = RTM_NEWLINK;
> > > + nlh->nlmsg_flags = NLM_F_REQUEST;
> > > + ifm = mnl_nlmsg_put_extra_header(nlh, sizeof(*ifm));
> > > + ifm->ifi_family = AF_UNSPEC;
> > > + ifm->ifi_type = 0;
> > > + ifm->ifi_index = vtep->ifindex;
> > > + ifm->ifi_flags = IFF_UP;
> > > + ifm->ifi_change = IFF_UP;
> > > + ret = flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL);
> > > + if (ret) {
> > > + DRV_LOG(WARNING,
> > > + "netlink: VTEP %s set link up failure (%d)",
> > > + name, rte_errno);
> > > + rte_free(vtep);
> > > + rte_flow_error_set
> > > + (error, -errno,
> > > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > NULL,
> > > + "netlink: failed to set VTEP link up");
> > > + vtep = NULL;
> > > + } else {
> > > + ret = mlx5_flow_tcf_init(tcf, vtep->ifindex, error);
> > > + if (ret)
> > > + DRV_LOG(WARNING,
> > > + "VTEP %s init failure (%d)", name, rte_errno);
> > > + }
> > > + } else {
> > > + DRV_LOG(WARNING,
> > > + "VTEP %s failed to get index (%d)", name, errno);
> > > + rte_flow_error_set
> > > + (error, -errno,
> > > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> > > + !vtep->created ? "netlink: failed to create VTEP" :
> > > + "netlink: failed to retrieve VTEP ifindex");
> > > + ret = 1;
> >
> > If it fails to create a vtep above, it will print out two warning messages and
> > one rte_flow_error message. And it even selects message to print between
> > two?
> > And there's another info msg at the end even in case of failure. Do you really
> > want to do this even with manipulating ret to change code path? Not a good
> > practice.
> >
> > Usually, code path should be straightforward for sucessful path and for
> > errors/failures, return immediately or use 'goto' if there's need for cleanup.
> >
> > Please refactor entire function.
>
> I think I'll split it in two ones - for attached and potentially shared ifaces.
> >
> > > + }
> > > + if (ret) {
> > > + flow_tcf_delete_iface(tcf, vtep);
> > > + vtep = NULL;
> > > + }
> > > + DRV_LOG(INFO, "VTEP create (%d, %s)", vtep->port, vtep ? "OK" :
> > "error");
> > > + return vtep;
> > > +}
> > > +#endif /* HAVE_IFLA_VXLAN_COLLECT_METADATA */
> > > +
> > > +/**
> > > + * Create target interface index for VXLAN tunneling decapsulation.
> > > + * In order to share the UDP port within the other interfaces the
> > > + * VXLAN device created as not attached to any interface (if created).
> > > + *
> > > + * @param[in] tcf
> > > + * Context object initialized by mlx5_flow_tcf_context_create().
> > > + * @param[in] dev_flow
> > > + * Flow tcf object with tunnel structure pointer set.
> > > + * @param[out] error
> > > + * Perform verbose error reporting if not NULL.
> > > + * @return
> > > + * Interface index on success, zero otherwise and rte_errno is set.
> >
> > Return negative errno in case of failure like others.
>
> Anyway, we have to return an index. If we do not return it as function result
> we will need to provide some extra pointing parameter, it complicates the code.
You misunderstood it. See what I wrote below. The function still returns the
index but in case of error, make it return negative errno instead of zero.
> >
> > * Interface index on success, a negative errno value otherwise and
> > rte_errno is set.
> >
> > > + */
> > > +static unsigned int
> > > +flow_tcf_decap_vtep_create(struct mlx5_flow_tcf_context *tcf,
> > > + struct mlx5_flow *dev_flow,
> > > + struct rte_flow_error *error)
> > > +{
> > > + struct mlx5_flow_tcf_vtep *vtep, *vlst;
> > > + uint16_t port = dev_flow->tcf.vxlan_decap->udp_port;
> > > +
> > > + vtep = NULL;
> > > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) {
> > > + if (vlst->port == port) {
> > > + vtep = vlst;
> > > + break;
> > > + }
> > > + }
> >
> > You just need one variable.
>
> Yes. There is a long story, I forgot to revert code to one variable after debugging.
> >
> > struct mlx5_flow_tcf_vtep *vtep;
> >
> > LIST_FOREACH(vtep, &vtep_list_vxlan, next) {
> > if (vtep->port == port)
> > break;
> > }
> >
> > > + if (!vtep) {
> > > + vtep = flow_tcf_create_iface(tcf, 0, port, error);
> > > + if (vtep)
> > > + LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next);
> > > + } else {
> > > + if (vtep->ifouter) {
> > > + rte_flow_error_set(error, -errno,
> > > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > NULL,
> > > + "Failed to create decap VTEP, attached "
> > > + "device with the same UDP port exists");
> > > + vtep = NULL;
> >
> > Making vtep null to skip the following code?
>
> Yes. To avoid multiple return operators in code.
It's okay to have multiple returns. Why not?
> > Please merge the two same
> > if/else and make the code path strightforward. And which errno do you
> > expect here?
> > Should it be set EEXIST instead?
> Not always. Netlink returns the code.
No, that's not my point. Your code above sets errno instead of rte_errno or
EEXIST.
} else {
if (vtep->ifouter) {
rte_flow_error_set(error, -errno,
Which one sets this errno? Here, it sets rte_errno because matched vtep can't be
used as it already has outer iface attached (error message isn't clear, please
reword it too). I thought this should be EEXIST but you set errno to rte_errno
but errno isn't valid at this point.
>
> >
> > > + }
> > > + }
> > > + if (vtep) {
> > > + vtep->refcnt++;
> > > + assert(vtep->ifindex);
> > > + return vtep->ifindex;
> > > + } else {
> > > + return 0;
> > > + }
> >
> > Why repeating same if/else?
> >
> >
> > This is my suggestion but if you take my suggestion to have
> > flow_tcf_[create|get|release]_iface(), this will get much simpler.
> Agree.
>
> >
> > {
> > struct mlx5_flow_tcf_vtep *vtep;
> > uint16_t port = dev_flow->tcf.vxlan_decap->udp_port;
> >
> > LIST_FOREACH(vtep, &vtep_list_vxlan, next) {
> > if (vtep->port == port)
> > break;
> > }
> > if (vtep && vtep->ifouter)
> > return rte_flow_error_set(... EEXIST ...);
> > else if (vtep) {
> > ++vtep->refcnt;
> > } else {
> > vtep = flow_tcf_create_iface(tcf, 0, port, error);
> > if (!vtep)
> > return rte_flow_error_set(...);
> > LIST_INSERT_HEAD(&vtep_list_vxlan, vtep, next);
> > }
> > assert(vtep->ifindex);
> > return vtep->ifindex;
> > }
> >
> >
> > > +}
> > > +
> > > +/**
> > > + * Creates target interface index for VXLAN tunneling encapsulation.
> > > + *
> > > + * @param[in] tcf
> > > + * Context object initialized by mlx5_flow_tcf_context_create().
> > > + * @param[in] ifouter
> > > + * Network interface index to attach VXLAN encap device to.
> > > + * @param[in] dev_flow
> > > + * Flow tcf object with tunnel structure pointer set.
> > > + * @param[out] error
> > > + * Perform verbose error reporting if not NULL.
> > > + * @return
> > > + * Interface index on success, zero otherwise and rte_errno is set.
> > > + */
> > > +static unsigned int
> > > +flow_tcf_encap_vtep_create(struct mlx5_flow_tcf_context *tcf,
> > > + unsigned int ifouter,
> > > + struct mlx5_flow *dev_flow __rte_unused,
> > > + struct rte_flow_error *error)
> > > +{
> > > + static uint16_t encap_port = MLX5_VXLAN_PORT_RANGE_MIN - 1;
> > > + struct mlx5_flow_tcf_vtep *vtep, *vlst;
> > > +
> > > + assert(ifouter);
> > > + /* Look whether the attached VTEP for encap is created. */
> > > + vtep = NULL;
> > > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) {
> > > + if (vlst->ifouter == ifouter) {
> > > + vtep = vlst;
> > > + break;
> > > + }
> > > + }
> >
> > Same here.
> >
> > > + if (!vtep) {
> > > + uint16_t pcnt;
> > > +
> > > + /* Not found, we should create the new attached VTEP. */
> > > +/*
> > > + * TODO: not implemented yet
> > > + * flow_tcf_encap_iface_cleanup(tcf, ifouter);
> > > + * flow_tcf_encap_local_cleanup(tcf, ifouter);
> > > + * flow_tcf_encap_neigh_cleanup(tcf, ifouter); */
> >
> > Personal note is not appropriate even though it is removed in the following
> > patch.
> >
> > > + for (pcnt = 0; pcnt <= (MLX5_VXLAN_PORT_RANGE_MAX
> > > + - MLX5_VXLAN_PORT_RANGE_MIN);
> > pcnt++) {
> > > + encap_port++;
> > > + /* Wraparound the UDP port index. */
> > > + if (encap_port < MLX5_VXLAN_PORT_RANGE_MIN
> > ||
> > > + encap_port > MLX5_VXLAN_PORT_RANGE_MAX)
> > > + encap_port =
> > MLX5_VXLAN_PORT_RANGE_MIN;
> > > + /* Check whether UDP port is in already in use. */
> > > + vtep = NULL;
> > > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) {
> > > + if (vlst->port == encap_port) {
> > > + vtep = vlst;
> > > + break;
> > > + }
> > > + }
> >
> > If you want to find out an empty port number, you can use rte_bitmap
> > instead of repeating searching the entire list for all possible port numbers.
>
> We do not expect too many VXLAN devices have been created. bitmap.
+1, valid point.
> > > + if (vtep) {
> > > + vtep = NULL;
> > > + continue;
> > > + }
> > > + vtep = flow_tcf_create_iface(tcf, ifouter,
> > > + encap_port, error);
> > > + if (vtep) {
> > > + LIST_INSERT_HEAD(&vtep_list_vxlan, vtep,
> > next);
> > > + break;
> > > + }
> > > + if (rte_errno != EEXIST)
> > > + break;
> > > + }
> > > + }
> > > + if (!vtep)
> > > + return 0;
> > > + vtep->refcnt++;
> > > + assert(vtep->ifindex);
> > > + return vtep->ifindex;
> >
> > Please refactor this func according to what I suggested for
> > flow_tcf_decap_vtep_create() and flow_tcf_delete_iface().
> >
> > > +}
> > > +
> > > +/**
> > > + * Creates target interface index for tunneling of any type.
> > > + *
> > > + * @param[in] tcf
> > > + * Context object initialized by mlx5_flow_tcf_context_create().
> > > + * @param[in] ifouter
> > > + * Network interface index to attach VXLAN encap device to.
> > > + * @param[in] dev_flow
> > > + * Flow tcf object with tunnel structure pointer set.
> > > + * @param[out] error
> > > + * Perform verbose error reporting if not NULL.
> > > + * @return
> > > + * Interface index on success, zero otherwise and rte_errno is set.
> >
> > * Interface index on success, a negative errno value otherwise and
> > * rte_errno is set.
> >
> > > + */
> > > +static unsigned int
> > > +flow_tcf_tunnel_vtep_create(struct mlx5_flow_tcf_context *tcf,
> > > + unsigned int ifouter,
> > > + struct mlx5_flow *dev_flow,
> > > + struct rte_flow_error *error)
> > > +{
> > > + unsigned int ret;
> > > +
> > > + assert(dev_flow->tcf.tunnel);
> > > + pthread_mutex_lock(&vtep_list_mutex);
> > > + switch (dev_flow->tcf.tunnel->type) {
> > > + case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP:
> > > + ret = flow_tcf_encap_vtep_create(tcf, ifouter,
> > > + dev_flow, error);
> > > + break;
> > > + case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP:
> > > + ret = flow_tcf_decap_vtep_create(tcf, dev_flow, error);
> > > + break;
> > > + default:
> > > + rte_flow_error_set(error, ENOTSUP,
> > > + RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
> > NULL,
> > > + "unsupported tunnel type");
> > > + ret = 0;
> > > + break;
> > > + }
> > > + pthread_mutex_unlock(&vtep_list_mutex);
> > > + return ret;
> > > +}
> > > +
> > > +/**
> > > + * Deletes tunneling interface by UDP port.
> > > + *
> > > + * @param[in] tcf
> > > + * Context object initialized by mlx5_flow_tcf_context_create().
> > > + * @param[in] ifindex
> > > + * Network interface index of VXLAN device.
> > > + * @param[in] dev_flow
> > > + * Flow tcf object with tunnel structure pointer set.
> > > + */
> > > +static void
> > > +flow_tcf_tunnel_vtep_delete(struct mlx5_flow_tcf_context *tcf,
> > > + unsigned int ifindex,
> > > + struct mlx5_flow *dev_flow)
> > > +{
> > > + struct mlx5_flow_tcf_vtep *vtep, *vlst;
> > > +
> > > + assert(dev_flow->tcf.tunnel);
> > > + pthread_mutex_lock(&vtep_list_mutex);
> > > + vtep = NULL;
> > > + LIST_FOREACH(vlst, &vtep_list_vxlan, next) {
> > > + if (vlst->ifindex == ifindex) {
> > > + vtep = vlst;
> > > + break;
> > > + }
> > > + }
> >
> > It is weird. You just can have vtep pointer in the dev_flow->tcf.tunnel instead
> > of ifindex_tun which is same as vtep->ifindex like the assertion below. Then,
> > this lookup can be skipped.
>
> OK. Good optimization.
>
> >
> > > + if (!vtep) {
> > > + DRV_LOG(WARNING, "No VTEP device found in the list");
> > > + goto exit;
> > > + }
> > > + switch (dev_flow->tcf.tunnel->type) {
> > > + case MLX5_FLOW_TCF_TUNACT_VXLAN_DECAP:
> > > + break;
> > > + case MLX5_FLOW_TCF_TUNACT_VXLAN_ENCAP:
> > > +/*
> > > + * TODO: Remove the encap ancillary rules first.
> > > + * flow_tcf_encap_neigh(tcf, vtep, dev_flow, false, NULL);
> > > + * flow_tcf_encap_local(tcf, vtep, dev_flow, false, NULL); */
> >
> > Is it a personal note? Please remove.
> OK.
>
> >
> > > + break;
> > > + default:
> > > + assert(false);
> > > + DRV_LOG(WARNING, "Unsupported tunnel type");
> > > + break;
> > > + }
> > > + assert(dev_flow->tcf.tunnel->ifindex_tun == vtep->ifindex);
> > > + assert(vtep->refcnt);
> > > + if (!vtep->refcnt || !--vtep->refcnt) {
> > > + LIST_REMOVE(vtep, next);
> > > + flow_tcf_delete_iface(tcf, vtep);
> > > + }
> > > +exit:
> > > + pthread_mutex_unlock(&vtep_list_mutex);
> > > +}
> > > +
> > > /**
> > > * Apply flow to E-Switch by sending Netlink message.
> > > *
> > > @@ -3461,18 +3887,61 @@ struct pedit_parser {
> > > struct rte_flow_error *error) {
> > > struct priv *priv = dev->data->dev_private;
> > > - struct mlx5_flow_tcf_context *nl = priv->tcf_context;
> > > + struct mlx5_flow_tcf_context *tcf = priv->tcf_context;
> > > struct mlx5_flow *dev_flow;
> > > struct nlmsghdr *nlh;
> > > + int ret;
> > >
> > > dev_flow = LIST_FIRST(&flow->dev_flows);
> > > /* E-Switch flow can't be expanded. */
> > > assert(!LIST_NEXT(dev_flow, next));
> > > + if (dev_flow->tcf.applied)
> > > + return 0;
> > > nlh = dev_flow->tcf.nlh;
> > > nlh->nlmsg_type = RTM_NEWTFILTER;
> > > nlh->nlmsg_flags = NLM_F_REQUEST | NLM_F_CREATE |
> > NLM_F_EXCL;
> > > - if (!flow_tcf_nl_ack(nl, nlh, 0, NULL, NULL))
> > > + if (dev_flow->tcf.tunnel) {
> > > + /*
> > > + * Replace the interface index, target for
> > > + * encapsulation, source for decapsulation.
> > > + */
> > > + assert(!dev_flow->tcf.tunnel->ifindex_tun);
> > > + assert(dev_flow->tcf.tunnel->ifindex_ptr);
> > > + /* Create actual VTEP device when rule is being applied. */
> > > + dev_flow->tcf.tunnel->ifindex_tun
> > > + = flow_tcf_tunnel_vtep_create(tcf,
> > > + *dev_flow->tcf.tunnel->ifindex_ptr,
> > > + dev_flow, error);
> > > + DRV_LOG(INFO, "Replace ifindex: %d->%d",
> > > + dev_flow->tcf.tunnel->ifindex_tun,
> > > + *dev_flow->tcf.tunnel->ifindex_ptr);
> > > + if (!dev_flow->tcf.tunnel->ifindex_tun)
> > > + return -rte_errno;
> > > + dev_flow->tcf.tunnel->ifindex_org
> > > + = *dev_flow->tcf.tunnel->ifindex_ptr;
> > > + *dev_flow->tcf.tunnel->ifindex_ptr
> > > + = dev_flow->tcf.tunnel->ifindex_tun;
> > > + }
> > > + ret = flow_tcf_nl_ack(tcf, nlh, 0, NULL, NULL);
> > > + if (dev_flow->tcf.tunnel) {
> > > + DRV_LOG(INFO, "Restore ifindex: %d->%d",
> > > + dev_flow->tcf.tunnel->ifindex_org,
> > > + *dev_flow->tcf.tunnel->ifindex_ptr);
> > > + *dev_flow->tcf.tunnel->ifindex_ptr
> > > + = dev_flow->tcf.tunnel->ifindex_org;
> > > + dev_flow->tcf.tunnel->ifindex_org = 0;
> >
> > ifindex_org looks a temporary storage in this code. And this kind of hassle
> > (replace/restore) is there because you took the ifindex from the netlink
> > message. Why don't you have just
> >
> > struct mlx5_flow_tcf_tunnel_hdr {
> > uint32_t type; /**< Tunnel action type. */
> > unsigned int ifindex; /**< Original dst/src interface */
> > struct mlx5_flow_tcf_vtep *vtep; /**< Tunnel endpoint device. */
> > unsigned int *nlmsg_ifindex_ptr; /**< ifindex ptr in Netlink message.
> > */ };
> >
> > and don't change ifindex?
>
> I propose to use the local variable for ifindex_org and do not keep it
> in structure. *ifindex_ptr will keep.
Well, you still have to restore the ifindex whenever sending the nl msg. Most of
all, ifindex_ptr in nl msg isn't a right place to store the ifindex. It should
have vtep ifindex but it just temporarily keeps the device ifindex until vtep is
created/found.
Thanks,
Yongseok
next prev parent reply other threads:[~2018-10-26 6:25 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-02 6:30 [dpdk-dev] [PATCH 1/5] net/mlx5: add VXLAN encap/decap support for e-switch Slava Ovsiienko
2018-10-02 6:30 ` [dpdk-dev] [PATCH 2/5] net/mlx5: e-switch VXLAN netlink routines update Slava Ovsiienko
2018-10-02 6:30 ` [dpdk-dev] [PATCH 3/5] net/mlx5: e-switch VXLAN flow validation routine Slava Ovsiienko
2018-10-02 6:30 ` [dpdk-dev] [PATCH 4/5] net/mlx5: e-switch VXLAN flow translation routine Slava Ovsiienko
2018-10-02 6:30 ` [dpdk-dev] [PATCH 5/5] net/mlx5: e-switch VXLAN tunnel devices management Slava Ovsiienko
2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 0/7] net/mlx5: e-switch VXLAN encap/decap hardware offload Viacheslav Ovsiienko
2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 1/7] net/mlx5: e-switch VXLAN configuration and definitions Viacheslav Ovsiienko
2018-10-23 10:01 ` Yongseok Koh
2018-10-25 12:50 ` Slava Ovsiienko
2018-10-25 23:33 ` Yongseok Koh
2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 2/7] net/mlx5: e-switch VXLAN flow validation routine Viacheslav Ovsiienko
2018-10-23 10:04 ` Yongseok Koh
2018-10-25 13:53 ` Slava Ovsiienko
2018-10-26 3:07 ` Yongseok Koh
2018-10-26 8:39 ` Slava Ovsiienko
2018-10-26 21:56 ` Yongseok Koh
2018-10-29 9:33 ` Slava Ovsiienko
2018-10-29 18:26 ` Yongseok Koh
2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 3/7] net/mlx5: e-switch VXLAN flow translation routine Viacheslav Ovsiienko
2018-10-23 10:06 ` Yongseok Koh
2018-10-25 14:37 ` Slava Ovsiienko
2018-10-26 4:22 ` Yongseok Koh
2018-10-26 9:06 ` Slava Ovsiienko
2018-10-26 22:10 ` Yongseok Koh
2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 4/7] net/mlx5: e-switch VXLAN netlink routines update Viacheslav Ovsiienko
2018-10-23 10:07 ` Yongseok Koh
2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 5/7] net/mlx5: e-switch VXLAN tunnel devices management Viacheslav Ovsiienko
2018-10-25 0:28 ` Yongseok Koh
2018-10-25 20:21 ` Slava Ovsiienko
2018-10-26 6:25 ` Yongseok Koh [this message]
2018-10-26 9:35 ` Slava Ovsiienko
2018-10-26 22:42 ` Yongseok Koh
2018-10-29 11:53 ` Slava Ovsiienko
2018-10-29 18:42 ` Yongseok Koh
2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 6/7] net/mlx5: e-switch VXLAN encapsulation rules management Viacheslav Ovsiienko
2018-10-25 0:33 ` Yongseok Koh
2018-10-15 14:13 ` [dpdk-dev] [PATCH v2 7/7] net/mlx5: e-switch VXLAN rule cleanup routines Viacheslav Ovsiienko
2018-10-25 0:36 ` Yongseok Koh
2018-10-25 20:32 ` Slava Ovsiienko
2018-10-26 6:30 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Slava Ovsiienko
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 01/13] net/mlx5: prepare makefile for adding e-switch VXLAN Slava Ovsiienko
2018-11-01 20:33 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 02/13] net/mlx5: prepare meson.build " Slava Ovsiienko
2018-11-01 20:33 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 03/13] net/mlx5: add necessary definitions for " Slava Ovsiienko
2018-11-01 20:35 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 04/13] net/mlx5: add necessary structures " Slava Ovsiienko
2018-11-01 20:36 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 05/13] net/mlx5: swap items/actions validations for e-switch rules Slava Ovsiienko
2018-11-01 20:37 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 06/13] net/mlx5: add e-switch VXLAN support to validation routine Slava Ovsiienko
2018-11-01 20:49 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 07/13] net/mlx5: add VXLAN support to flow prepare routine Slava Ovsiienko
2018-11-01 21:03 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 08/13] net/mlx5: add VXLAN support to flow translate routine Slava Ovsiienko
2018-11-01 21:18 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 09/13] net/mlx5: e-switch VXLAN netlink routines update Slava Ovsiienko
2018-11-01 21:21 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 10/13] net/mlx5: fix e-switch Flow counter deletion Slava Ovsiienko
2018-11-01 22:00 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 11/13] net/mlx5: add e-switch VXLAN tunnel devices management Slava Ovsiienko
2018-11-01 23:59 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 12/13] net/mlx5: add e-switch VXLAN encapsulation rules Slava Ovsiienko
2018-11-02 0:01 ` Yongseok Koh
2018-11-01 12:19 ` [dpdk-dev] [PATCH v3 13/13] net/mlx5: add e-switch VXLAN rule cleanup routines Slava Ovsiienko
2018-11-02 0:01 ` Yongseok Koh
2018-11-01 20:32 ` [dpdk-dev] [PATCH v3 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Yongseok Koh
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 " Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 01/13] net/mlx5: prepare makefile for adding E-Switch VXLAN Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 01/13] net/mlx5: prepare makefile for adding E-Switch VXLAN Slava Ovsiienko
2018-11-12 20:01 ` [dpdk-dev] [PATCH 0/4] net/mlx5: prepare to add E-switch rule flags check Slava Ovsiienko
2018-11-12 20:01 ` [dpdk-dev] [PATCH 1/4] net/mlx5: prepare Netlink communication routine to fix Slava Ovsiienko
2018-11-13 13:21 ` Shahaf Shuler
2018-11-12 20:01 ` [dpdk-dev] [PATCH 2/4] net/mlx5: fix Netlink communication routine Slava Ovsiienko
2018-11-13 13:21 ` Shahaf Shuler
2018-11-14 12:57 ` Slava Ovsiienko
2018-11-12 20:01 ` [dpdk-dev] [PATCH 3/4] net/mlx5: prepare to add E-switch rule flags check Slava Ovsiienko
2018-11-12 20:01 ` [dpdk-dev] [PATCH 4/4] net/mlx5: add E-switch rule hardware offload flag check Slava Ovsiienko
2018-11-13 13:21 ` [dpdk-dev] [PATCH 0/4] net/mlx5: prepare to add E-switch rule flags check Shahaf Shuler
2018-11-14 14:56 ` Shahaf Shuler
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 02/13] net/mlx5: prepare meson.build for adding E-Switch VXLAN Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 03/13] net/mlx5: add necessary definitions for " Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 04/13] net/mlx5: add necessary structures " Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 05/13] net/mlx5: swap items/actions validations for E-Switch rules Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 06/13] net/mlx5: add E-Switch VXLAN support to validation routine Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 07/13] net/mlx5: add VXLAN support to flow prepare routine Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 08/13] net/mlx5: add VXLAN support to flow translate routine Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 09/13] net/mlx5: update E-Switch VXLAN netlink routines Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 10/13] net/mlx5: fix E-Switch Flow counter deletion Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 11/13] net/mlx5: add E-switch VXLAN tunnel devices management Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 12/13] net/mlx5: add E-Switch VXLAN encapsulation rules Slava Ovsiienko
2018-11-03 6:18 ` [dpdk-dev] [PATCH v5 13/13] net/mlx5: add E-switch VXLAN rule cleanup routines Slava Ovsiienko
2018-11-04 6:48 ` [dpdk-dev] [PATCH v5 00/13] net/mlx5: e-switch VXLAN encap/decap hardware offload Shahaf Shuler
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 02/13] net/mlx5: prepare meson.build for adding E-Switch VXLAN Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 03/13] net/mlx5: add necessary definitions for " Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 04/13] net/mlx5: add necessary structures " Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 05/13] net/mlx5: swap items/actions validations for E-Switch rules Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 07/13] net/mlx5: add VXLAN support to flow prepare routine Slava Ovsiienko
2018-11-02 21:38 ` Yongseok Koh
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 06/13] net/mlx5: add E-Switch VXLAN support to validation routine Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 08/13] net/mlx5: add VXLAN support to flow translate routine Slava Ovsiienko
2018-11-02 21:53 ` Yongseok Koh
2018-11-02 23:29 ` Yongseok Koh
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 09/13] net/mlx5: update E-Switch VXLAN netlink routines Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 10/13] net/mlx5: fix E-Switch Flow counter deletion Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 11/13] net/mlx5: add E-switch VXLAN tunnel devices management Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 12/13] net/mlx5: add E-Switch VXLAN encapsulation rules Slava Ovsiienko
2018-11-02 17:53 ` [dpdk-dev] [PATCH v4 13/13] net/mlx5: add E-switch VXLAN rule cleanup routines Slava Ovsiienko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181026062307.GD6434@mtidpdk.mti.labs.mlnx \
--to=yskoh@mellanox.com \
--cc=dev@dpdk.org \
--cc=shahafs@mellanox.com \
--cc=viacheslavo@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).