From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f66.google.com (mail-pa0-f66.google.com [209.85.220.66]) by dpdk.org (Postfix) with ESMTP id EA8BD214A for ; Tue, 2 Aug 2016 20:19:37 +0200 (CEST) Received: by mail-pa0-f66.google.com with SMTP id q2so12422236pap.0 for ; Tue, 02 Aug 2016 11:19:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding; bh=Pi901HpIivwouhMk/UJs2RQ8KMCAeT/JJa9gr+3lCe8=; b=bTRGENBHrrTH5kIGZYUVn4FAWWPuU9o3oYK6ojtlHEFQ6SMgRzd3KXiuO9BkuYA2vK 2UgGfLTwS3JMWB/yR44iidIZOQGgGzi4ybERy670R4u2VuUUcT65Ib8J9qgO45Z48D1h D7tex+/252EWHKpMdI7vT0BbiF15SKqIuRE/j+RBBulgdKMyBhAhan+mDaATYIfS0tZm YXJonbaEQZ7DMYk7Zf9uNUwdXPywk8uydDrRhI2788dLP+zvNaqrUt1WkoTsNqggupKx OoRyb6kJQFbQHyt4kyVDPZgIGhDujuyrbOUmg+dtoTVGkab/LTBp94g/nmi2SwsSEHqA jjvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=Pi901HpIivwouhMk/UJs2RQ8KMCAeT/JJa9gr+3lCe8=; b=e5jn8PbgH29u0QtmwP+T1uuwszdbCj8lZXqZLnpIzNNBmqO9dxjuLLvj+YRtSW9qu/ zdIQYZhLmZAsOokfe3tN6PwFKzYYNQ/0Cyupa/VajKWJkPMsOIKlK+e0zwJ/+FOTeWVE fFrHLQV1SXy75mDRCn2Vg+MLGC6Qwu5TypPPUgPc2PKiCdTMya1+ZDODwjX8hIpyALOV 5bqRfubPXo6DQKohrsysYLwE1VC2JmG+hZLfrAvWrVxHagCgj/XhRCk3AHHuZt4rF9Ff S+VhvPq9ERxKlD0tNJ8DLodON4Z4GpVc7H2B5ekcuwxcK8KnwlFr2m2pQ5KTlI72ghct sRkg== X-Gm-Message-State: AEkoouvy8Wvq6OPz400iV+0Sy2O2jQwt9L1HG0FJFlwUyYHrZYFTOh9T3Z+XAomxuTa8qA== X-Received: by 10.66.146.69 with SMTP id ta5mr109815704pab.157.1470161977098; Tue, 02 Aug 2016 11:19:37 -0700 (PDT) Received: from [192.168.1.6] ([72.168.145.53]) by smtp.googlemail.com with ESMTPSA id 18sm6482760pfn.33.2016.08.02.11.19.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 02 Aug 2016 11:19:36 -0700 (PDT) To: Jerin Jacob , dev@dpdk.org, Thomas Monjalon , Helin Zhang , Jingjing Wu , Rasesh Mody , Ajit Khaparde , Rahul Lakkireddy , Wenzhuo Lu , Jan Medala , John Daley , Jing Chen , Konstantin Ananyev , Matej Vido , Alejandro Lucero , Sony Chacko , Pablo de Lara , Olga Shern References: <20160705181646.GO7621@6wind.com> <20160711104141.GA10172@localhost.localdomain> <20160721192023.GU7621@6wind.com> <5793DD3E.3080605@gmail.com> From: John Fastabend Message-ID: <57A0E423.2030804@gmail.com> Date: Tue, 2 Aug 2016 11:19:15 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <5793DD3E.3080605@gmail.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [RFC] Generic flow director/filtering/classification API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Aug 2016 18:19:38 -0000 On 16-07-23 02:10 PM, John Fastabend wrote: > On 16-07-21 12:20 PM, Adrien Mazarguil wrote: >> Hi Jerin, >> >> Sorry, looks like I missed your reply. Please see below. >> > > Hi Adrian, > > Sorry for a bit delay but a few comments that may be worth considering. > > To start with completely agree on the general problem statement and the > nice summary of all the current models. Also good start on this. > >> >> Considering that allowed pattern/actions combinations cannot be known in >> advance and would result in an unpractically large number of capabilities to >> expose, a method is provided to validate a given rule from the current >> device configuration state without actually adding it (akin to a "dry run" >> mode). > > Rather than have a query/validate process why did we jump over having an > intermediate representation of the capabilities? Here you state it is > unpractical but we know how to represent parse graphs and the drivers > could report their supported parse graph via a single query to a middle > layer. > > This will actually reduce the msg chatter imagine many applications at > init time or in boundary cases where a large set of applications come > online at once and start banging on the interface all at once seems less > than ideal. > A bit more details on possible interface for capabilities query, One way I've used to describe these graphs from driver to software stacks is to use a set of structures to build the graph. For fixed graphs this could just be *.h file for programmable hardware (typically coming from fw update on nics) the driver can read the parser details out of firmware and render the structures. I've done this two ways: one is to define all the fields in their own structures using something like, struct field { char *name; u32 uid; u32 bitwidth; }; This gives a unique id (uid) for each field along with its width and a user friendly name. The fields are organized into headers via a header structure, struct header_node { char *name; u32 uid; u32 *fields; struct parse_graph *jump; }; Each node has a unique id and then a list of fields. Where 'fields' is a list of uid's of fields its also easy enough to embed the field struct in the header_node if that is simpler its really a style question. The 'struct parse_graph' gives the list of edges from this header node to other header nodes. Using a parse graph structure defined struct parse_graph { struct field_reference ref; __u32 jump_uid; }; Again as a matter of style you can embed the parse graph in the header node as I did above or do it as its own object. The field_reference noted below gives the id of the field and the value e.g. the tuple (ipv4.protocol, 6) then jump_uid would be the uid of TCP. struct field_reference { __u32 header_uid; __u32 field_uid; __u32 mask_type; __u32 type; __u8 *value; __u8 *mask; }; The cost doing all this is some additional overhead at init time. But building generic function over this and having a set of predefined uids for well-known protocols such ip, udp, tcp, etc helps. What you get for the cost is a few things that I think are worth it. (i) Now new protocols can be added/removed without recompiling DPDK (ii) a software package can use the capability query to verify the required protocols are off-loadable vs a possibly large set of test queries and (iii) when we do the programming of the device we can provide a tuple (table-uid, header-uid, field-uid, value, mask, priority) and the middle layer "knowing" the above graph can verify the command so drivers only ever see "good" commands, (iv) finally it should be faster in terms of cmds per second because the drivers can map the tuple (table, header, field, priority) to a slot efficiently vs parsing. IMO point (iii) and (iv) will in practice make the code much simpler because we can maintain common middle layer and not require parsing by drivers. Making each driver simpler by abstracting into common layer. > Worse in my opinion it requires all drivers to write mostly duplicating > validation code where a common layer could easily do this if every > driver reported a common data structure representing its parse graph > instead. The nice fallout of this initial effort upfront is the driver > no longer needs to do error handling/checking/etc and can assume all > rules are correct and valid. It makes driver code much simpler to > support. And IMO at least by doing this we get some other nice benefits > described below. > > Another related question is about performance. > >> Creation >> ~~~~~~~~ >> >> Creating a flow rule is similar to validating one, except the rule is >> actually created. >> >> :: >> >> struct rte_flow * >> rte_flow_create(uint8_t port_id, >> const struct rte_flow_pattern *pattern, >> const struct rte_flow_actions *actions); > > I gather this implies that each driver must parse the pattern/action > block and map this onto the hardware. How many rules per second can this > support? I've run into systems that expect a level of service somewhere > around 50k cmds per second. So bulking will help at the message level > but it seems like a lot of overhead to unpack the pattern/action section. > > One strategy I've used in other systems that worked relatively well > is if the query for the parse graph above returns a key for each node > in the graph then a single lookup can map the key to a node. Its > unambiguous and then these operations simply become a table lookup. > So to be a bit more concrete this changes the pattern structure in > rte_flow_create() into a tuple where the key is known > by the initial parse graph query. If you reserve a set of well-defined > key values for well known protocols like ethernet, ip, etc. then the > query model also works but the middle layer catches errors in this case > and again the driver only gets known good flows. So something like this, > > struct rte_flow_pattern { > uint32_t priority; > uint32_t key; > uint32_t value_length; > u8 *value; > } > > Also if we have multiple tables what do you think about adding a > table_id to the signature. Probably not needed in the first generation > but is likely useful for hardware with multiple tables so that it > would be, > > rte_flow_create(uint8_t port_id, uint8_t table_id, ...); > > Finally one other problem we've had which would be great to address > if we are doing a rewrite of the API is adding new protocols to > already deployed DPDK stacks. This is mostly a Linux distribution > problem where you can't easily update DPDK. > > In the prototype header linked in this document it seems to add new > headers requires adding a new enum in the rte_flow_item_type but there > is at least an attempt at a catch all here, > >> /** >> * Matches a string of a given length at a given offset (in bytes), >> * or anywhere in the payload of the current protocol layer >> * (including L2 header if used as the first item in the stack). >> * >> * See struct rte_flow_item_raw. >> */ >> RTE_FLOW_ITEM_TYPE_RAW, > > Actually this is a nice implementation because it works after the > previous item in the stack correct? So you can put it after "known" > variable length headers like IP. The limitation is it can't get past > undefined variable length headers. However if you use the above parse > graph reporting from the driver mechanism and the driver always reports > its largest supported graph then we don't have this issue where a new > hardware sku/ucode/etc added support for new headers but we have no > way to deploy it to existing software users without recompiling and > redeploying. > > I looked at the git repo but I only saw the header definition I guess > the implementation is TBD after there is enough agreement on the > interface? > > Thanks, > John >