From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9A4BFA0032; Fri, 1 Oct 2021 10:54:38 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1CE594067A; Fri, 1 Oct 2021 10:54:38 +0200 (CEST) Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113]) by mails.dpdk.org (Postfix) with ESMTP id A265240040 for ; Fri, 1 Oct 2021 10:54:37 +0200 (CEST) Received: from [192.168.38.17] (aros.oktetlabs.ru [192.168.38.17]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by shelob.oktetlabs.ru (Postfix) with ESMTPSA id 0CE967F408; Fri, 1 Oct 2021 11:54:37 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru 0CE967F408 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=oktetlabs.ru; s=default; t=1633078477; bh=21hR62GhXrhEOiErGdqvOuhgh8Xc5GfA5mbpZ5SzAWc=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=LB7l29L+eX+NhqT2bhVDR5gNJLyuj++7WPBSRbXka6X2/O1PMlz0qZ9jXmuN4afUY O8Z1piIAYXQz1IYteVzZVQkSbgXqhjJ0NlltoLi6qBBszkpizpLc5ypj4LnaxHD/Wj pSMQYz68YX7uErTU+WvMG1yALaTExZSomVPgbEl8= To: Thomas Monjalon , Ivan Malov Cc: dev@dpdk.org, Andy Moreton , orika@nvidia.com, ferruh.yigit@intel.com, olivier.matz@6wind.com References: <20210902142359.28138-1-ivan.malov@oktetlabs.ru> <8e727e12-6655-43b9-9af3-bcc5b882508d@oktetlabs.ru> <9f44035b-9569-746a-d2cd-73a793348f31@oktetlabs.ru> <5427719.I9DohtKF8S@thomas> From: Andrew Rybchenko Organization: OKTET Labs Message-ID: <8e1ef3a6-ccf3-abf7-4862-5e6eee9c476d@oktetlabs.ru> Date: Fri, 1 Oct 2021 11:54:36 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <5427719.I9DohtKF8S@thomas> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v3 0/5] A means to negotiate delivery of Rx meta data X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/1/21 11:11 AM, Thomas Monjalon wrote: > 01/10/2021 08:47, Andrew Rybchenko: >> On 9/30/21 10:30 PM, Ivan Malov wrote: >>> Hi Thomas, >>> >>> On 30/09/2021 19:18, Thomas Monjalon wrote: >>>> 23/09/2021 13:20, Ivan Malov: >>>>> In 2019, commit [1] announced changes in DEV_RX_OFFLOAD namespace >>>>> intending to add new flags, RSS_HASH and FLOW_MARK. Since then, >>>>> only the former has been added. The problem hasn't been solved. >>>>> Applications still assume that no efforts are needed to enable >>>>> flow mark and similar meta data delivery. >>>>> >>>>> The team behind net/sfc driver has to take over the efforts since >>>>> the problem has started impacting us. Riverhead, a cutting edge >>>>> Xilinx smart NIC family, has two Rx prefix types. Rx meta data >>>>> is available only from long Rx prefix. Switching between the >>>>> prefix formats can't happen in started state. Hence, we run >>>>> into the same problem which [1] was aiming to solve. >>>> >>>> Sorry I don't understand what is Rx prefix? >>> >>> A small chunk of per-packet metadata in Rx packet buffer preceding the >>> actual packet data. In terms of mbuf, this could be something lying >>> before m->data_off. > > I've never seen the word "Rx prefix". Yes, I agree. The term is vendor-specific. > In general we talk about mbuf headroom and mbuf metadata, > the rest being the mbuf payload and mbuf tailroom. > I guess you mean mbuf metadata in the space of the struct rte_mbuf? Not exactly. It is rather lower level, but finally yes, it goes to extra data represented by one or another field in mbuf structure. Broadly Rx metadata is all per-packet extra information available in HW and could be delivered to SW: - Rx checksum offloads information - Rx packet classification - RSS hash - flow mark/flag - flow meta - tunnel offload information - source e-Switch port Delivering everything is expensive. That's why we have offload flags, possibility to reduce required Rx packet classification etc. Some metadata are not covered yet and the series suggest an approach how to cover it. > >>>>> Rx meta data (mark, flag, tunnel ID) delivery is not an offload >>>>> on its own since the corresponding flows must be active to set >>>>> the data in the first place. Hence, adding offload flags >>>>> similar to RSS_HASH is not a good idea. >>>> >>>> What means "active" here? >>> >>> Active = inserted and functional. What this paragraph is trying to say >>> is that when you enable, say, RSS_HASH, that implies both computation of >>> the hash and the driver's ability to extract in from packets >>> ("delivery"). But when it comes to MARK, it's just "delivery". No >>> "offload" here: the NIC won't set any mark in packets unless you create >>> a flow rule to make it do so. That's the gist of it. > > OK > Yes I agree RTE_FLOW_ACTION_TYPE_MARK doesn't need any offload flag. > Same for RTE_FLOW_ACTION_TYPE_SET_META. > >>>>> Patch [1/5] of this series adds a generic API to let applications >>>>> negotiate delivery of Rx meta data during initialisation period. > > What is a metadata? See above. > Do you mean RTE_FLOW_ITEM_TYPE_META and RTE_FLOW_ITEM_TYPE_MARK? > Metadata word could cover any field in the mbuf struct so it is vague. We failed to find better term. Yes, it overlaps with other Rx features. We can document exceptions and add references to existing ways to control these exceptions. If you have idea how to name it, you're welcome. > >>>>> This way, an application knows right from the start which parts >>>>> of Rx meta data won't be delivered. Hence, no necessity to try >>>>> inserting flows requesting such data and handle the failures. >>>> >>>> Sorry I don't understand the problem you want to solve. >>>> And sorry for not noticing earlier. >>> >>> No worries. *Some* PMDs do not enable delivery of, say, Rx mark with the >>> packets by default (for performance reasons). If the application tries >>> to insert a flow with action MARK, the PMD may not be able to enable >>> delivery of Rx mark without the need to re-start Rx sub-system. And >>> that's fraught with traffic disruption and similar bad consequences. In >>> order to address it, we need to let the application express its interest >>> in receiving mark with packets as early as possible. This way, the PMD >>> can enable Rx mark delivery in advance. And, as an additional benefit, >>> the application can learn *from the very beginning* whether it will be >>> possible to use the feature or not. If this API tells the application >>> that no mark delivery will be enabled, then the application can just >>> skip many unnecessary attempts to insert wittingly unsupported flows >>> during runtime. > > I'm puzzled, because we could have the same reasoning for any offload. > I don't understand why we are focusing on mark only. > I would prefer we find a generic solution using the rte_flow API. > Can we make rte_flow_validate() working before port start? > If validating a fake rule doesn't make sense, > why not having a new function accepting a single action as parameter? IMHO, it will be misuse of the rte_flow_validate(). It will be complex from application point of view and driver implementation point of view since most likely implemented in a absolutely different code branch. Also what should be checked for tunnel offload? > >> Thomas, if I'm not mistaken, net/mlx5 dv_xmeta_en driver option >> is vendor-specific way to address the same problem. > > Not exactly, it is configuring the capabilities: > +------+-----------+-----------+-------------+-------------+ > | Mode | ``MARK`` | ``META`` | ``META`` Tx | FDB/Through | > +======+===========+===========+=============+=============+ > | 0 | 24 bits | 32 bits | 32 bits | no | > +------+-----------+-----------+-------------+-------------+ > | 1 | 24 bits | vary 0-32 | 32 bits | yes | > +------+-----------+-----------+-------------+-------------+ > | 2 | vary 0-24 | 32 bits | 32 bits | yes | > +------+-----------+-----------+-------------+-------------+ Sorry, but I don't understand the difference. Negotiate is exactly about capabilities which we want to use.