From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <adrien.mazarguil@6wind.com>
Received: from mail-wm0-f44.google.com (mail-wm0-f44.google.com [74.125.82.44])
 by dpdk.org (Postfix) with ESMTP id B9D0A1B97B
 for <dev@dpdk.org>; Tue, 10 Apr 2018 12:20:12 +0200 (CEST)
Received: by mail-wm0-f44.google.com with SMTP id u189so24958936wmd.1
 for <dev@dpdk.org>; Tue, 10 Apr 2018 03:20:12 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=6wind-com.20150623.gappssmtp.com; s=20150623;
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:in-reply-to;
 bh=E1QDLuFvFA96Jcr8iuPUL8PYaQyIkgGzT8lsP2AP4V8=;
 b=lafbrtUJXGIbnYWAYycyv+TlF+bnf/cYbp+ZOj4ZR0bC3fUdNg0mpEOcNtNQVgVs7F
 1VrClhjGpOFPQJ/RVJJC+DO3KHakZE6gvTlmYUCr/CpchNCVgzmbuLDM4Jq8x7mrhg2D
 C9JvJBxVkG+c0j34czHpj97IGwtHjFrqrxhIwJgmnQ1xDSnOCkO9kK137fIU8u8E4KYS
 Ds5SoVc+kM9Iqndiwgl4gIWbgxrIEf0ig7ZNF88l6WZ/R/+zV8MQIU/Mmic04IO9/FYm
 6+5LJVUYJvFkszyBG0qRchDWfTU3437ghaxwZOsva4qwpfakFN5KPpnpZZzWk6U/dC5+
 PGwA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:in-reply-to;
 bh=E1QDLuFvFA96Jcr8iuPUL8PYaQyIkgGzT8lsP2AP4V8=;
 b=CsByEsT/WY+iFfRnE5uiXtsYwH7sP/I2SuTYVH0faHJhzRF/5RaVkma6fHeAvyeOaV
 er11pRYfT/JF8+ZVb/9fWGyaz0l8LJm0+J+nDnRhQx+Z1kkIIAHKML+ZTf7iz1m+zLfS
 mFSPozoiDJCI4NCoLbS3iHKW0WRV30Z3ENEX9egi3PNDXmG9ybpNUPWvux01d8qe4EOD
 t7sFqitoDPJ3PF6zOEWV7lGR+BydGSGv3y+4BSpGIzM83tpdzipog0V0Ljh08eO7EttW
 tD6xis/Xsn0YLC6D/NTWajcKW8yykFo2oAYF2yva67hKJk3tiXI6FL8eXv4uDJEkiNbv
 VIOA==
X-Gm-Message-State: ALQs6tDPXQU671X1nhOlWjj7zi75FqYr3a/3Tr1sFKEMYkLlEJfpm0zh
 F9RafBOXktfURG4970bObzgGKw==
X-Google-Smtp-Source: AIpwx491530istS6ZQxPugMzdgxdisfueQ8iIzrELTDw/E9fAePgON2yzzFViOEokN4K9QrE9rLkHQ==
X-Received: by 10.80.134.136 with SMTP id r8mr2355834eda.30.1523355612375;
 Tue, 10 Apr 2018 03:20:12 -0700 (PDT)
Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78])
 by smtp.gmail.com with ESMTPSA id h56sm1757220ede.71.2018.04.10.03.20.11
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 10 Apr 2018 03:20:11 -0700 (PDT)
Date: Tue, 10 Apr 2018 12:19:58 +0200
From: Adrien Mazarguil <adrien.mazarguil@6wind.com>
To: Mohammad Abdul Awal <mohammad.abdul.awal@intel.com>
Cc: Declan Doherty <declan.doherty@intel.com>, dev@dpdk.org
Message-ID: <20180410101958.GF4957@6wind.com>
References: <1523017443-12414-1-git-send-email-declan.doherty@intel.com>
 <1523017443-12414-3-git-send-email-declan.doherty@intel.com>
 <20180406202641.GS4957@6wind.com>
 <d8beb480-baf6-c1d6-a828-8f6291b83e25@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <d8beb480-baf6-c1d6-a828-8f6291b83e25@intel.com>
Subject: Re: [dpdk-dev] [PATCH v3 2/4] ethdev: Add tunnel encap/decap actions
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Apr 2018 10:20:12 -0000

On Mon, Apr 09, 2018 at 05:10:35PM +0100, Mohammad Abdul Awal wrote:
> On 06/04/2018 21:26, Adrien Mazarguil wrote:
> > On Fri, Apr 06, 2018 at 01:24:01PM +0100, Declan Doherty wrote:
> > > Add new flow action types and associated action data structures to
> > > support the encapsulation and decapsulation of the virtual tunnel
> > > endpoints.
> > > 
> > > The RTE_FLOW_ACTION_TYPE_TUNNEL_ENCAP action will cause the matching
> > > flow to be encapsulated in the virtual tunnel endpoint overlay
> > > defined in the tunnel_encap action data.
> > > 
> > > The RTE_FLOW_ACTION_TYPE_TUNNEL_DECAP action will cause all virtual
> > > tunnel endpoint overlays up to and including the first instance of
> > > the flow item type defined in the tunnel_decap action data for the
> > > matching flows.
> > > 
> > > Signed-off-by: Declan Doherty <declan.doherty@intel.com>
> > This generic approach looks flexible enough to cover the use cases that
> > immediately come to mind (VLAN, VXLAN), its design is sound.
> > 
> > However, while I'm aware it's not a concern at this point, it won't be able
> > to deal with stateful tunnel or encapsulation types (e.g. IPsec or TCP)
> > which will require additional meta data or some run-time assistance from the
> > application.
> > 
> > Eventually for more complex use cases, dedicated encap/decap actions will
> > have to appear, so the issue I wanted to raise before going further is this:
> > 
> > Going generic inevitably trades some of the usability; flat structures
> > dedicated to VXLAN encap/decap with only the needed info to get the job done
> > would likely be easier to implement in PMDs and use in applications. Any
> > number of such actions can be added to rte_flow without ABI impact.
> > 
> > If VXLAN is the only use case at this point, my suggestion would be to go
> > with simpler RTE_FLOW_ACTION_TYPE_VXLAN_(ENCAP|DECAP) actions, with fixed
> > L2/L3/L4/L5 header definitions to prepend according to RFC 7348.
> We can go this way and this will increase the action for more and more
> tunneling protocols being added. Current proposal is already a generic
> approach which specifies as a tunnel for all the tunneling protocols.

Right, on the other hand there are not that many standard encapsulations
offloaded by existing devices. rte_flow could easily handle dedicated
actions for each of them without problem.

My point is that many of those (will eventually) have their own quirks to
manage when doing encap/decap, it's not just a matter of prepending or
removing a bunch of header definitions, otherwise we could as well let
applications simply provide an arbitrary buffer to prepend.

Consider that the "generic" part is already built into rte_flow as the way
patterns and action are handled. Adding another generic layer on top of that
could make things more inconvenient than necessary to applications (my main
concern).

You'd need another layer of validation/error reporting machinery to properly
let applications know they cannot encap VXLAN on top of TCP on top of
QinQinQinQinQ for instance. Either a single bounded encapsulation definition
or a combination at the action list level is needed to avoid that.

> > Now we can start with the generic approach, see how it fares and add
> > dedicated encap/decap later as needed.
> > 
> > More comments below.
<snip>
> > > +Action: ``TUNNEL_ENCAP``
> > > +^^^^^^^^^^^^^^^^^^^^^^
> > > +
> > > +Performs an encapsulation action by encapsulating the flows matched by the
> > > +pattern items according to the network overlay defined in the
> > > +``rte_flow_action_tunnel_encap`` pattern items.
> > > +
> > > +This action modifies the payload of matched flows. The pattern items specified
> > > +in the ``rte_flow_action_tunnel_encap`` action structure must defined a valid
> > > +set of overlay headers, from the Ethernet header up to the overlay header. The
> > > +pattern must be terminated with the RTE_FLOW_ITEM_TYPE_END item type.
> > Regarding the use of a pattern list, if you consider PMDs are already
> > iterating on a list of actions when encountering
> > RTE_FLOW_ACTION_TYPE_TUNNEL_ENCAP, it adds yet another inner loop.
> We understand that it is implementation specifics. If we do not go for
> another inner loop, all the bundling need to be handled in the same
> function, which seems more clumsy to me. This also breaks the tunnel
> endpoint concept.
> > 
> > How about making each encountered RTE_FLOW_ACTION_TYPE_TUNNEL_ENCAP provide
> > exactly one item instead (in encap, i.e. reverse order)?
> Again, if we have tunnel action, security action, and other actions, all the
> processing and tracking need to be done in one function. Now we will need
> ETH_ENCAP/DECAP, UDP_ENCAP/DECAP, NVGRE_ENCAP/DECAP, etc.

Well, the number of DECAP actions doesn't need to perfectly reflect that of
ENCAP since it implies all preceding layers. No problem with that.

Regarding multiple dedicated actions, my suggestion was for a single generic
one as in this patch, but each instance on the ENCAP side would deal with a
single protocol layer, instead of having a single ENCAP action with multiple
inner layers (and thus an inner loop).

PMDs also gain the ability to precisely report which encap step fails by
making rte_flow_error point to the problematic object to ease debugging of
flow rules on the application side.

Why would that break the tunnel idea and more importantly, how would it
prevent PMD developers from splitting their processing into multiple
functions?

> > 
> > In which case perhaps "GENERIC" would be a better fit than "TUNNEL".
> > 
<snip>
> > > +
> > > +   +-------+--------------------------+------------+
> > > +   | Index | Flow Item Type           | Flow Item  |
> > > +   +=======+==========================+============+
> > > +   | 0     | RTE_FLOW_ITEM_TYPE_ETH   | eth item   |
> > > +   +-------+--------------------------+------------+
> > > +   | 1     | RTE_FLOW_ITEM_TYPE_IPV4  | ipv4 item  |
> > > +   +-------+--------------------------+------------+
> > > +   | 2     | RTE_FLOW_ITEM_TYPE_UDP   | udp item   |
> > > +   +-------+--------------------------+------------+
> > > +   | 3     | RTE_FLOW_ITEM_TYPE_VXLAN | vxlan item |
> > > +   +-------+--------------------------+------------+
> > > +   | 4     | RTE_FLOW_ITEM_TYPE_END   | NULL       |
> > > +   +-------+--------------------------+------------+
> > One possible issue is that it relies on objects normally found on the
> > pattern side of flow rules. Those are supposed to match something, they are
> > not intended for packet header generation. While their "spec" and "mask"
> > fields might make sense in this context, the "last" field is odd.
> > 
> > You must define them without leaving anything open for interpretation by
> > PMDs and users alike. Defining things as "undefined" is fine as long as it's
> > covered.
> Please note that the "void *item" in the
> "rte_flow_action_tunnel_encap.pattern" points to the data structure defined
> for the corresponding rte_flow_item_type instead of a rte_flow_item
> structure. As an example, for the rte_flow_item_eth type, the "void *item"
> will point to a "struct rte_flow_item_eth" instance. Thats why we have
> defined struct rte_flow_action_item inside struct
> rte_flow_action_tunnel_encap. So, no question of spec, mask, last anymore.

Right, I noticed that after commenting its structure definition below.

I think I won't be the only one confused by this approach, also because a
mask is needed in addition to a specification structure, otherwise how do
you plan to tell what fields are relevant in application-provided protocol
headers?

An application might set unusual IPv4/UDP/VXLAN fields and expect them to be
part of the encapsulated traffic. Without a mask, a PMD must take headers
verbatim, and I don't think many devices are ready for that yet.

Hence my other suggestion: defining inflexible $PROTOCOL_(ENCAP|DECAP)
actions that do not allow more than what's defined by official RFCs for
$PROTOCOL.

<snip>
> > > + */
> > > +struct rte_flow_action_tunnel_encap {
> > > +	struct rte_flow_action_item {
> > > +		enum rte_flow_item_type type;
> > > +		/**< Flow item type. */
> > > +		const void *item;
> > > +		/**< Flow item definition which points to the data of
> > > +		 * corresponding rte_flow_item_type.
> > > +		 */
> > I see it's a new action type, albeit a bit confusing (there is no
> > RTE_FLOW_ACTION_TYPE_ITEM).
> > 
> > I suggest the standard pattern item type since you're going with enum
> > rte_flow_item_type anyway. Keep in mind you need some kind of mask to tell
> > what fields are relevant. An application might otherwise want to encap with
> > unsupported properties (e.g. specific IPv4 ToS field and whatnot).
> > 
> > How about a single "struct rte_flow_pattern_item item", neither const and
> > neither a pointer. It's generic enough, enclosed spec/last/mask pointers
> > take care of the specifics. You just need to define what's supposed to
> > happen when "last" is set.
> Please see the comment above regarding this field.

Point still stands, if you need to distinguish spec and mask, a more
complete structure is needed. Instead of adding a new (confusing) type, you
should use rte_flow_item and define what happens when "last" is set.

You should define it as reserved for now, any non-NULL value is an
error. This field might also be useful later.

<snip>
> > > +};
> > > +
> > > +/**
> > > + * RTE_FLOW_ACTION_TYP_TUNNEL_DECAP
> > > + *
> > > + * Virtual tunnel end-point decapsulation action data.
> > > + *
> > > + * Non-terminating action by default.
> > > + */
> > > +struct rte_flow_action_tunnel_decap {
> > > +	enum rte_flow_item_type type;
> > > +	/**<
> > > +	 * Flow item type of virtual tunnel end-point to be decapsulated
> > > +	 */
> > > +};
> > Note that contrary to ENCAP, DECAP wouldn't necessarily need repeated
> > actions to peel each layer off. The current definition is fine.
> To clarify, the the decap is upto the PMD to remove all the header for a
> specified type. For example, for
> 
> rte_flow_item_type type=RTE_FLOW_ITEM_TYPE_VXLAN, the PMD will peel off (ETH, IPV4, UDP, VXLAN) header all together.

Yep, that's fine, whether we use multiple actions or a single one doing
multiple things, a single DECAP can peel them off all at once :)

> > 
> > > +
> > > +/**
> > >    * Definition of a single action.
> > >    *
> > >    * A list of actions is terminated by a END action.
> > > -- 
> > > 2.7.4
> > > 

If the reasons I gave did not manage to convince you about choosing between
either fixed (VLAN|VXLAN)_(ENCAP|DECAP) actions or generic encap/decap
actions that deal with a single protocol layer at once instead of the
proposed approach, I'm ready to try it out as an experimental API (all new
objects tagged as experimental) *if* you address the lack of mask, which
remains an open issue.

-- 
Adrien Mazarguil
6WIND