From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6CCA6A0032; Fri, 1 Oct 2021 11:48:58 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E922B4067E; Fri, 1 Oct 2021 11:48:57 +0200 (CEST) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by mails.dpdk.org (Postfix) with ESMTP id A4BA54067A for ; Fri, 1 Oct 2021 11:48:55 +0200 (CEST) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 27E325C00E6; Fri, 1 Oct 2021 05:48:55 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute6.internal (MEProxy); Fri, 01 Oct 2021 05:48:55 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm2; bh= T/tBWks4JIxxJvXAZ37A2Zz+ghmjKUZfZpSxVHtPfcA=; b=Lj6I0qS37Wnmg47T AowRDAwjYB+39tBo+pPoCWp5sp92Oz+rx5iu2gVk2Yj6361s2leX8NbblWVGQPSw JWlvQWYMC65EGuzdeywSZ5FRjyBHkJ4/GBZSZ+AorsGBX3g8oNmIQkuq903SUS/S dSfl1lHm8ex6IPfRFP0zt7kGcWtlBgEfZTpWYUn7kbMjpMOYZaUu3nieZkyhwWWC M30bhhg5u2B3ZOlxVeRFGEScsPtjdJcBjd1FG0jiKaEM9XH//4Ao41bj08jQZ3og JwlHls4JTs9dAElMHsPQu0YO/dCFdQkb9QE7T6SjiyiwsV5tvJBnu1NceWextqdN tTmh2Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=T/tBWks4JIxxJvXAZ37A2Zz+ghmjKUZfZpSxVHtPf cA=; b=lldmhM1O+CUtfQUPTLe7RqjJB3tBLaHfEloJERJOTX+hCzcbQwvT6mfm5 EAiAPxYlQG9rQEp4zqalLIEEPxsCGq9pizEIkCak9AKItQvNdN50iXf+yipKADcG 6JqesisLdQDp3MsPSasqJTSEu43/RukQb0f3k+uA5HpQyHlBf1Km8PyCBN2+gT/Q 05Y5ZKEg8IlFPI2b+U3PsS+fGAC4/IP0X2veyRJ6+7AtwhjMufPwI1c3+p495HQv UYF+91pO+S1jbG2dO+jACoFPO/uP4j1j4V1XtojDho8R5aOMua7DUzdesQRkxSBT SaqX9cAIDnFws2xKTNUMH5GgK/vmg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrudekiedgudejucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufffkfgjfhgggfgtsehtufertddttddvnecuhfhrohhmpefvhhhomhgr shcuofhonhhjrghlohhnuceothhhohhmrghssehmohhnjhgrlhhonhdrnhgvtheqnecugg ftrfgrthhtvghrnhepudeggfdvfeduffdtfeeglefghfeukefgfffhueejtdetuedtjeeu ieeivdffgeehnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrh homhepthhhohhmrghssehmohhnjhgrlhhonhdrnhgvth X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 1 Oct 2021 05:48:53 -0400 (EDT) From: Thomas Monjalon To: Andrew Rybchenko , Ivan Malov Cc: dev@dpdk.org, Andy Moreton , orika@nvidia.com, ferruh.yigit@intel.com, olivier.matz@6wind.com Date: Fri, 01 Oct 2021 11:48:52 +0200 Message-ID: <2522405.PTVv94qZMn@thomas> In-Reply-To: References: <20210902142359.28138-1-ivan.malov@oktetlabs.ru> <5427719.I9DohtKF8S@thomas> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v3 0/5] A means to negotiate delivery of Rx meta data X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 01/10/2021 10:55, Ivan Malov: > On 01/10/2021 11:11, Thomas Monjalon wrote: > > 01/10/2021 08:47, Andrew Rybchenko: > >> On 9/30/21 10:30 PM, Ivan Malov wrote: > >>> On 30/09/2021 19:18, Thomas Monjalon wrote: > >>>> 23/09/2021 13:20, Ivan Malov: > >>>>> In 2019, commit [1] announced changes in DEV_RX_OFFLOAD namespace > >>>>> intending to add new flags, RSS_HASH and FLOW_MARK. Since then, > >>>>> only the former has been added. The problem hasn't been solved. > >>>>> Applications still assume that no efforts are needed to enable > >>>>> flow mark and similar meta data delivery. > >>>>> > >>>>> The team behind net/sfc driver has to take over the efforts since > >>>>> the problem has started impacting us. Riverhead, a cutting edge > >>>>> Xilinx smart NIC family, has two Rx prefix types. Rx meta data > >>>>> is available only from long Rx prefix. Switching between the > >>>>> prefix formats can't happen in started state. Hence, we run > >>>>> into the same problem which [1] was aiming to solve. > >>>> > >>>> Sorry I don't understand what is Rx prefix? > >>> > >>> A small chunk of per-packet metadata in Rx packet buffer preceding the > >>> actual packet data. In terms of mbuf, this could be something lying > >>> before m->data_off. > > > > I've never seen the word "Rx prefix". > > In general we talk about mbuf headroom and mbuf metadata, > > the rest being the mbuf payload and mbuf tailroom. > > I guess you mean mbuf metadata in the space of the struct rte_mbuf? > > In this paragraph I describe the two ways how the NIC itself can provide > metadata buffers of different sizes. Hence the term "Rx prefix". As you > understand, the NIC HW is unaware of DPDK, mbufs and whatever else SW > concepts. To NIC, this is "Rx prefix", that is, a chunk of per-packet > metadata *preceding* the actual packet data. It's responsibility of the > PMD to treat this the right way, care about headroom, payload and > tailroom. I describe the two Rx prefix formats in NIC terminology just > to provide the gist of the problem. OK but it is confusing as it is vendor-specific. Please stick with DPDK terms if possible. > >>>>> Rx meta data (mark, flag, tunnel ID) delivery is not an offload > >>>>> on its own since the corresponding flows must be active to set > >>>>> the data in the first place. Hence, adding offload flags > >>>>> similar to RSS_HASH is not a good idea. > >>>> > >>>> What means "active" here? > >>> > >>> Active = inserted and functional. What this paragraph is trying to say > >>> is that when you enable, say, RSS_HASH, that implies both computation of > >>> the hash and the driver's ability to extract in from packets > >>> ("delivery"). But when it comes to MARK, it's just "delivery". No > >>> "offload" here: the NIC won't set any mark in packets unless you create > >>> a flow rule to make it do so. That's the gist of it. > > > > OK > > Yes I agree RTE_FLOW_ACTION_TYPE_MARK doesn't need any offload flag. > > Same for RTE_FLOW_ACTION_TYPE_SET_META. > > > >>>>> Patch [1/5] of this series adds a generic API to let applications > >>>>> negotiate delivery of Rx meta data during initialisation period. > > > > What is a metadata? > > Do you mean RTE_FLOW_ITEM_TYPE_META and RTE_FLOW_ITEM_TYPE_MARK? > > Metadata word could cover any field in the mbuf struct so it is vague. > > Metadata here is *any* additional information provided by the NIC for > each received packet. For example, Rx flag, Rx mark, RSS hash, packet > classification info, you name it. I'd like to stress out that the > suggested API comes with flags each of which is crystal clear on what > concrete kind of metadata it covers, eg. Rx mark. I missed the flags. You mean these 3 flags? +/** The ethdev sees flagged packets if there are flows with action FLAG. */ +#define RTE_ETH_RX_META_USER_FLAG (UINT64_C(1) << 0) + +/** The ethdev sees mark IDs in packets if there are flows with action MARK. */ +#define RTE_ETH_RX_META_USER_MARK (UINT64_C(1) << 1) + +/** The ethdev detects missed packets if there are "tunnel_set" flows in use. */ +#define RTE_ETH_RX_META_TUNNEL_ID (UINT64_C(1) << 2) It is not crystal clear because it does not reference the API, like RTE_FLOW_ACTION_TYPE_MARK. And it covers a limited set of metadata. Do you intend to extend to all mbuf metadata? > >>>>> This way, an application knows right from the start which parts > >>>>> of Rx meta data won't be delivered. Hence, no necessity to try > >>>>> inserting flows requesting such data and handle the failures. > >>>> > >>>> Sorry I don't understand the problem you want to solve. > >>>> And sorry for not noticing earlier. > >>> > >>> No worries. *Some* PMDs do not enable delivery of, say, Rx mark with the > >>> packets by default (for performance reasons). If the application tries > >>> to insert a flow with action MARK, the PMD may not be able to enable > >>> delivery of Rx mark without the need to re-start Rx sub-system. And > >>> that's fraught with traffic disruption and similar bad consequences. In > >>> order to address it, we need to let the application express its interest > >>> in receiving mark with packets as early as possible. This way, the PMD > >>> can enable Rx mark delivery in advance. And, as an additional benefit, > >>> the application can learn *from the very beginning* whether it will be > >>> possible to use the feature or not. If this API tells the application > >>> that no mark delivery will be enabled, then the application can just > >>> skip many unnecessary attempts to insert wittingly unsupported flows > >>> during runtime. > > > > I'm puzzled, because we could have the same reasoning for any offload. > > We're not discussing *offloads*. An offload is when NIC *computes > something* and *delivers* it. We are discussing precisely *delivery*. OK but still, there are a lot more mbuf metadata delivered. > > I don't understand why we are focusing on mark only > > We are not focusing on mark on purpose. It's just how our discussion > goes. I chose mark (could've chosen flag or anything else) just to show > you an example. > > > I would prefer we find a generic solution using the rte_flow API. > Can we make rte_flow_validate() working before port start? > > If validating a fake rule doesn't make sense, > > why not having a new function accepting a single action as parameter? > > A noble idea, but if we feed the entire flow rule to the driver for > validation, then the driver must not look specifically for actions FLAG > or MARK in it (to enable or disable metadata delivery). This way, the > driver is obliged to also validate match criteria, attributes, etc. And, > if something is unsupported (say, some specific item), the driver will > have to reject the rule as a whole thus leaving the application to join > the dots itself. > > Say, you ask the driver to validate the following rule: > pattern blah-blah-1 / blah-blah-2 / end action flag / end > intending to check support for FLAG delivery. Suppose, the driver > doesn't support pattern item "blah-blah-1". It will throw an error right > after seeing this unsupported item and won't even go further to see the > action FLAG. How can application know whether its request for FLAG was > heard or not? No, I'm proposing a new function to validate the action alone, without any match etc. Example: rte_flow_action_request(RTE_FLOW_ACTION_TYPE_MARK) > And I'd not bind delivery of metadata to flow API. Consider the > following example. We have a DPDK application sitting at the *host* and > we have a *guest* with its *own* DPDK instance. The guest DPDK has asked > the NIC (by virtue of flow API) to mark all outgoing packets. This > packets reach the *host* DPDK. Say, the host application just wants to > see the marked packets from the guest. Its own, (the host's) use of flow > API is a don't care here. The host doesn't want to mark packets itself, > it wants to see packets marked by the guest. It does not make sense to me. We are talking about a DPDK API. My concern is to avoid redefining new flags while we already have rte_flow actions.