From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 3F02E2A66 for ; Tue, 6 Dec 2016 19:17:46 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga103.fm.intel.com with ESMTP; 06 Dec 2016 10:11:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,310,1477983600"; d="scan'208";a="14452948" Received: from irsmsx107.ger.corp.intel.com ([163.33.3.99]) by orsmga002.jf.intel.com with ESMTP; 06 Dec 2016 10:11:39 -0800 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.79]) by IRSMSX107.ger.corp.intel.com ([169.254.10.66]) with mapi id 14.03.0248.002; Tue, 6 Dec 2016 18:11:38 +0000 From: "Chandran, Sugesh" To: Kevin Traynor , Adrien Mazarguil CC: "dev@dpdk.org" , Thomas Monjalon , "De Lara Guarch, Pablo" , Olivier Matz , "sugesh.chandran@intel.comn" Thread-Topic: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API Thread-Index: AQHSQCYBGHBxiz6sm06ZxWXdKPvFC6Dx40iAgAD4jACAAmPWAIAGCHSg Date: Tue, 6 Dec 2016 18:11:38 +0000 Message-ID: <2EF2F5C0CC56984AA024D0B180335FCB13EC330D@IRSMSX102.ger.corp.intel.com> References: <1c8a8e4fec73ed33836f1da9525b1b8b53048518.1479309720.git.adrien.mazarguil@6wind.com> <59393e58-6c85-d2e5-1aab-a721fe9c933e@redhat.com> <20161201083652.GI10340@6wind.com> <7f65ba09-e6fe-d97a-6ab5-97e84a828a81@redhat.com> In-Reply-To: <7f65ba09-e6fe-d97a-6ab5-97e84a828a81@redhat.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNjUwYjQxYTItYjBhOS00NjEyLWJmYzAtYzA4NmFlODJjMmVkIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6ImlSVnAyYlNiajlQU2pRZzR2WGJlQjliNk5HZjdSa0hsdXI5a0IwMDh1QWM9In0= x-ctpclassification: CTP_IC x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Dec 2016 18:17:47 -0000 Hi Adrien, Thanks for sending out the patches, Please find few comments below, Regards _Sugesh > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Kevin Traynor > Sent: Friday, December 2, 2016 9:07 PM > To: Adrien Mazarguil > Cc: dev@dpdk.org; Thomas Monjalon ; De > Lara Guarch, Pablo ; Olivier Matz > ; sugesh.chandran@intel.comn > Subject: Re: [dpdk-dev] [PATCH 01/22] ethdev: introduce generic flow API >=20 >>>>>>Snipp > >>> + * > >>> + * Attaches a 32 bit value to packets. > >>> + * > >>> + * This value is arbitrary and application-defined. For > >>> +compatibility with > >>> + * FDIR it is returned in the hash.fdir.hi mbuf field. > >>> +PKT_RX_FDIR_ID is > >>> + * also set in ol_flags. > >>> + */ > >>> +struct rte_flow_action_mark { > >>> + uint32_t id; /**< 32 bit value to return with packets. */ }; > >> > >> One use case I thought we would be able to do for OVS is > >> classification in hardware and the unique flow id is sent with the pac= ket to > software. > >> But in OVS the ufid is 128 bits, so it means we can't and there is > >> still the miniflow extract overhead. I'm not sure if there is a > >> practical way around this. > >> > >> Sugesh (cc'd) has looked at this before and may be able to comment or > >> correct me. > > > > Yes, we settled on 32 bit because currently no known hardware > > implementation supports more than this. If that changes, another > > action with a larger type shall be provided (no ABI breakage). > > > > Also since even 64 bit would not be enough for the use case you > > mention, there is no choice but use this as an indirect value (such as > > an array or hash table index/value). >=20 > ok, cool. I think Sugesh has other ideas anyway! [Sugesh] It should be fine with 32 bit . we can manage it in OVS accordingl= y. >=20 > > > > [...] > >>> +/** > >>> + * RTE_FLOW_ACTION_TYPE_RSS > >>> + * > >>> + > >>> + * > >>> + * Terminating by default. > >>> + */ > >>> +struct rte_flow_action_vf { > >>> + uint32_t original:1; /**< Use original VF ID if possible. */ > >>> + uint32_t reserved:31; /**< Reserved, must be zero. */ > >>> + uint32_t id; /**< VF ID to redirect packets to. */ }; > > [...] > >>> +/** > >>> + * Check whether a flow rule can be created on a given port. > >>> + * > >>> + * While this function has no effect on the target device, the flow > >>> +rule is > >>> + * validated against its current configuration state and the > >>> +returned value > >>> + * should be considered valid by the caller for that state only. > >>> + * > >>> + * The returned value is guaranteed to remain valid only as long as > >>> +no > >>> + * successful calls to rte_flow_create() or rte_flow_destroy() are > >>> +made in > >>> + * the meantime and no device parameter affecting flow rules in any > >>> +way are > >>> + * modified, due to possible collisions or resource limitations > >>> +(although in > >>> + * such cases EINVAL should not be returned). > >>> + * > >>> + * @param port_id > >>> + * Port identifier of Ethernet device. > >>> + * @param[in] attr > >>> + * Flow rule attributes. > >>> + * @param[in] pattern > >>> + * Pattern specification (list terminated by the END pattern item)= . > >>> + * @param[in] actions > >>> + * Associated actions (list terminated by the END action). > >>> + * @param[out] error > >>> + * Perform verbose error reporting if not NULL. > >>> + * > >>> + * @return > >>> + * 0 if flow rule is valid and can be created. A negative errno va= lue > >>> + * otherwise (rte_errno is also set), the following errors are def= ined: > >>> + * > >>> + * -ENOSYS: underlying device does not support this functionality. > >>> + * > >>> + * -EINVAL: unknown or invalid rule specification. > >>> + * > >>> + * -ENOTSUP: valid but unsupported rule specification (e.g. partia= l > >>> + * bit-masks are unsupported). > >>> + * > >>> + * -EEXIST: collision with an existing rule. > >>> + * > >>> + * -ENOMEM: not enough resources. > >>> + * > >>> + * -EBUSY: action cannot be performed due to busy device resources= , > may > >>> + * succeed if the affected queues or even the entire port are in a > stopped > >>> + * state (see rte_eth_dev_rx_queue_stop() and > rte_eth_dev_stop()). > >>> + */ > >>> +int > >>> +rte_flow_validate(uint8_t port_id, > >>> + const struct rte_flow_attr *attr, > >>> + const struct rte_flow_item pattern[], > >>> + const struct rte_flow_action actions[], > >>> + struct rte_flow_error *error); > >> > >> Why not just use rte_flow_create() and get an error? Is it less > >> disruptive to do a validate and find the rule cannot be created, than > >> using a create directly? > > > > The rationale can be found in the original RFC, which I'll convert to > > actual documentation in v2. In short: > > > > - Calling rte_flow_validate() before rte_flow_create() is useless since > > rte_flow_create() also performs validation. > > > > - We cannot possibly express a full static set of allowed flow rules, e= ven > > if we could, it usually depends on the current hardware configuration > > therefore would not be static. > > > > - rte_flow_validate() is thus provided as a replacement for capability > > flags. It can be used to determine during initialization if the under= lying > > device can support the typical flow rules an application might want t= o > > provide later and do something useful with that information (e.g. alw= ays > > use software fallback due to HW limitations). > > > > - rte_flow_validate() being a subset of rte_flow_create(), it is essent= ially > > free to expose. >=20 > make sense now, thanks. [Sugesh] : We had this discussion earlier at the design stage about the tim= e taken for programming the hardware, and how to make it deterministic. How about having a timeout parameter as w= ell for the rte_flow_* If the hardware flow insert is timed out, error out than waiting indefinite= ly, so that application have some control over The time to program the flow. It can be another set of APIs something like,= rte_flow_create_timeout() Are you going to provide any control over the initialization of NIC to def= ine the capability matrices For eg; To operate in a L3 router mode, software wanted to initialize the = NIC port only to consider the L2 and L3 fields. I assume the initialization is done based on the first rules that are progr= ammed into the NIC.? >=20 > > > >>> + > >>> +/** > >>> + * Create a flow rule on a given port. > >>> + * > >>> + * @param port_id > >>> + * Port identifier of Ethernet device. > >>> + * @param[in] attr > >>> + * Flow rule attributes. > >>> + * @param[in] pattern > >>> + * Pattern specification (list terminated by the END pattern item)= . > >>> + * @param[in] actions > >>> + * Associated actions (list terminated by the END action). > >>> + * @param[out] error > >>> + * Perform verbose error reporting if not NULL. > >>> + * > >>> + * @return > >>> + * A valid handle in case of success, NULL otherwise and rte_errno= is > set > >>> + * to the positive version of one of the error codes defined for > >>> + * rte_flow_validate(). > >>> + */ > >>> +struct rte_flow * > >>> +rte_flow_create(uint8_t port_id, > >>> + const struct rte_flow_attr *attr, > >>> + const struct rte_flow_item pattern[], > >>> + const struct rte_flow_action actions[], > >>> + struct rte_flow_error *error); > >> > >> General question - are these functions threadsafe? In the OVS example > >> you could have several threads wanting to create flow rules at the > >> same time for same or different ports. > > > > No they aren't, applications have to perform their own locking. The > > RFC (to be converted to actual documentation in v2) says that: > > > > - API operations are synchronous and blocking (``EAGAIN`` cannot be > > returned). > > > > - There is no provision for reentrancy/multi-thread safety, although > nothing > > should prevent different devices from being configured at the same > > time. PMDs may protect their control path functions accordingly. >=20 > other comment above wrt locking. >=20 > >