From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 3BF93A0471 for ; Wed, 19 Jun 2019 11:06:13 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 0B9BA1C2B7; Wed, 19 Jun 2019 11:06:13 +0200 (CEST) Received: from dispatch1-us1.ppe-hosted.com (dispatch1-us1.ppe-hosted.com [148.163.129.52]) by dpdk.org (Postfix) with ESMTP id 73A061C2A8 for ; Wed, 19 Jun 2019 11:06:11 +0200 (CEST) X-Virus-Scanned: Proofpoint Essentials engine Received: from webmail.solarflare.com (uk.solarflare.com [193.34.186.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mx1-us4.ppe-hosted.com (PPE Hosted ESMTP Server) with ESMTPS id 22DD44C0059; Wed, 19 Jun 2019 09:06:09 +0000 (UTC) Received: from [192.168.1.124] (217.71.239.98) by ukex01.SolarFlarecom.com (10.17.10.4) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Wed, 19 Jun 2019 10:05:58 +0100 To: Yongseok Koh CC: "Wang, Haiyue" , Shahaf Shuler , Thomas Monjalon , "Yigit, Ferruh" , Adrien Mazarguil , "olivier.matz@6wind.com" , "dev@dpdk.org" , "Ananyev, Konstantin" References: <20190603213231.27020-1-yskoh@mellanox.com> <7047a597-ea0d-f159-e95d-0fd8bca5b78d@solarflare.com> <82445af1-9e66-9de9-f3d2-176de09d904b@solarflare.com> <20190611000505.GA25815@mtidpdk.mti.labs.mlnx> From: Andrew Rybchenko Message-ID: <57f8ee8d-7db3-6faa-f4d0-25c3eff80e48@solarflare.com> Date: Wed, 19 Jun 2019 12:05:50 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 MIME-Version: 1.0 In-Reply-To: <20190611000505.GA25815@mtidpdk.mti.labs.mlnx> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Originating-IP: [217.71.239.98] X-ClientProxiedBy: ocex03.SolarFlarecom.com (10.20.40.36) To ukex01.SolarFlarecom.com (10.17.10.4) X-TM-AS-Product-Ver: SMEX-12.5.0.1300-8.5.1010-24692.000 X-TM-AS-Result: No-16.863200-8.000000-10 X-TMASE-MatchedRID: 8HTFlOrbAtHmLzc6AOD8DfHkpkyUphL9SeIjeghh/zPI13IEGi/Kk9Wb ti7E2r8piCilT76ynH37TF/80zIrAxDyNU0hOxtblVHM/F6YkvQ/u6JWmeYsWAkvdgbN/SqfcdQ cKpNhse1pvLuvLuAwgSkpcgahZBUmIb90fDiD2WCqNnzrkU+2mgsE9gx+4jJuLViYRbgc2vDLqK AdrGpCH0NBLxuVixpERAn6ip26T2u6uOkVGXZoA8xtC7V8+IIPfo0lncdGFFNPvOpmjDN2kl2t6 g5e4q9Eu8ifaWw+xYjZsNA4lAbe+xGCgJyJZpzvRTO9mhIXG428xE2H2EuMWa/N81JKPjmo4a8t jcE0lxwc8ogqNrytEk+oHek5MFSPx7fBXoDaFyFOHhVJB+QhXBQEj9RZgbsWnF+Nm+WXvuJKj9R zqMBfu4Nb2hFQhWCfGfTh0+Bkp4DHxYVNuu4KAsXA78ZtFL/gBuDI4IXfrj8gE4i77Er+HJ6Iqk 2qVOWaEajgakRFGCSLswG9JftQnu6yTUwZeueH9ZScrwilWZQnw4PQgteoUBLf1vz7ecPHUGwm8 ngjlNxGk4qcd9wxZ8FckmUwtr0pLsncZ11pL/+VyEX4i+SWU34JYJwdJw4TSFyaPnhpvhQU46KF JVAHBiLG0nkbgZVl9IOZFsHIw8CN5TTne9XhRx5dqSUtZTxbZAGtCJE23YgHQvT9S3vHUE0Wita yrE/oJLJ76y6q5fYUAJo+VKdyFdotBparG5gp+NCQDut0K4UO9z+P2gwiBX8ZdqjsUKEVLKiGE4 quj624wh20TP6lXqgkjZqQUo5kLGmmJTL2jhyeAiCmPx4NwLTrdaH1ZWqC1B0Hk1Q1KyLUZxEAl FPo846HM5rqDwqtpS1Fgz+8wTOeSCeh8RmBcRcopsX7SHSFYAPvus9qU1OUTxwgCxJlFg== X-TM-AS-User-Approved-Sender: Yes X-TM-AS-User-Blocked-Sender: No X-TMASE-Result: 10--16.863200-8.000000 X-TMASE-Version: SMEX-12.5.0.1300-8.5.1010-24692.000 X-MDID: 1560935170-Qad7YEBFWiqE Subject: Re: [dpdk-dev] [RFC 1/3] ethdev: extend flow metadata X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 11.06.2019 3:06, Yongseok Koh wrote: > On Mon, Jun 10, 2019 at 10:20:28AM +0300, Andrew Rybchenko wrote: >> On 6/10/19 6:19 AM, Wang, Haiyue wrote: >>>> -----Original Message----- >>>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Andrew Rybchenko >>>> Sent: Sunday, June 9, 2019 22:24 >>>> To: Yongseok Koh ; shahafs@mellanox.com; thomas@monjalon.net; Yigit, Ferruh >>>> ; adrien.mazarguil@6wind.com; olivier.matz@6wind.com >>>> Cc: dev@dpdk.org >>>> Subject: Re: [dpdk-dev] [RFC 1/3] ethdev: extend flow metadata >>>> >>>> On 6/4/19 12:32 AM, Yongseok Koh wrote: >>>>> Currently, metadata can be set on egress path via mbuf tx_meatadata field >>>>> with PKT_TX_METADATA flag and RTE_FLOW_ITEM_TYPE_RX_META matches metadata. >>>>> >>>>> This patch extends the usability. >>>>> >>>>> 1) RTE_FLOW_ACTION_TYPE_SET_META >>>>> >>>>> When supporting multiple tables, Tx metadata can also be set by a rule and >>>>> matched by another rule. This new action allows metadata to be set as a >>>>> result of flow match. >>>>> >>>>> 2) Metadata on ingress >>>>> >>>>> There's also need to support metadata on packet Rx. Metadata can be set by >>>>> SET_META action and matched by META item like Tx. The final value set by >>>>> the action will be delivered to application via mbuf metadata field with >>>>> PKT_RX_METADATA ol_flag. >>>>> >>>>> For this purpose, mbuf->tx_metadata is moved as a separate new field and >>>>> renamed to 'metadata' to support both Rx and Tx metadata. >>>>> >>>>> For loopback/hairpin packet, metadata set on Rx/Tx may or may not be >>>>> propagated to the other path depending on HW capability. >>>>> >>>>> Signed-off-by: Yongseok Koh >>>> There is a mark on Rx which is delivered to application in hash.fdir.hi. >>>> Why do we need one more 32-bit value set by NIC and delivered to >>>> application? >>>> What is the difference between MARK and META on Rx? >>>> When application should use MARK and when META? >>>> Is there cases when both could be necessary? >>>> >>> In my understanding, MARK is FDIR related thing, META seems to be NIC >>> specific. And we also need this kind of specific data field to export >>> NIC's data to application. >> I think it is better to avoid NIC vendor-specifics in motivation. I >> understand >> that it exists for you, but I think it is better to look at it from RTE flow >> API >> definition point of view: both are 32-bit (except endianess and I'm not sure >> that I understand why meta is defined as big-endian since it is not a value >> coming from or going to network in a packet, I'm sorry that I've missed it >> on review that time), both may be set using action on Rx, both may be >> matched using pattern item. > Yes, MARK and META has the same characteristic on Rx path. Let me clarify why I > picked this way. > > What if device has more bits to deliver to host? Currently, only 32-bit data can > be delivered to user via MARK ID. Now we have more requests from users (OVS > connection tracking) that want to see more information generated during flow > match from the device. Let's say it is 64 bits and it may contain intermediate > match results to keep track of multi-table match, to keep address of callback > function to call, or so. I thought about extending the current MARK to 64-bit > but I knew that we couldn't make more room in the first cacheline of mbuf where > every vendor has their critical interest. And the FDIR has been there for a long > time and has lots of use-cases in DPDK (not easy to break). This is why I'm > suggesting to obtain another 32 bits in the second cacheline of the structure. > > Also, I thought about other scenario as well. Even though we have MARK item > introduced lately, it isn't used by any PMD at all for now, meaning it might not > be match-able on a certain device. What if there are two types registers on Rx > and one is match-able and the other isn't? PMD can use META for match-able > register while MARK is used for non-match-able register without supporting > item match. If MARK simply becomes 64-bit just because it has the same > characteristic in terms of rte_flow, only one of such registers can be used as > we can't say only part of bits are match-able on the item. Instead of extending > the MARK to 64 bits, I thought it would be better to give more flexibility by > bundling it with Tx metadata, which can set by mbuf. Thanks a lot for explanations. If the way is finally approved, priority among META and MARK should be defined. I.e. if only one is supported or only one may be match, it must be MARK. Otherwise, it will be too complicated for applications to find out which one to use. Is there any limitations on usage of MARK or META in transfer rules? There is a lot of work on documentation in this area to make it usable. > The actual issue we have may be how we can make it scalable? What if there's > more need to carry more data from device? Well, IIRC, Olivier once suggested to > put a pointer (like mbuf->userdata) to extend mbuf struct beyond two cachelines. > But we still have some space left at the end. > >>>> Moreover, the third patch adds 32-bit tags which are not delivered to >>>> application. May be META/MARK should be simply a kind of TAG (e.g. with >>>> index 0 or marked using additional attribute) which is delivered to >>>> application? > Yes, TAG is a kind of transient device-internal data which isn't delivered to > host. It would be a design choice. I could define all these kinds as an array of > MARK IDs having different attributes - some are exportable/match-able and others > are not, which sounds quite complex. As rte_flow doesn't have a direct way to > check device capability (user has to call a series of validate functions > instead), I thought defining TAG would be better. > >>>> (It is either API breakage (if tx_metadata is removed) or ABI breakage >>>> if metadata and tx_metadata will share new location after shinfo). > Fortunately, mlx5 is the only entity which uses tx_metadata so far. As I understand it is still breakage. >>> Make use of udata64 to export NIC metadata to application ? >>> RTE_STD_C11 >>> union { >>> void *userdata; /**< Can be used for external metadata */ >>> uint64_t udata64; /**< Allow 8-byte userdata on 32-bit */ >>> uint64_t rx_metadata; >>> }; >> As I understand it does not work for Tx and I'm not sure that it is >> a good idea to have different locations for Tx and Rx. >> >> RFC adds it at the end of mbuf, but it was rejected before since >> it eats space in mbuf structure (CC Konstantin). > Yep, I was in the discussion. IIRC, the reason wasn't because it ate space but > because it could recycle unused space on Tx path. We still have 16B after shinfo > and I'm not sure how many bytes we should reserve. I think reserving space for > one pointer would be fine. I have no strong opinion. Thanks, Andrew. > Thanks, > Yongseok > >> There is a long discussion on the topic before [1], [2], [3] and [4]. >> >> Andrew. >> >> [1] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-August%2F109660.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&sdata=EFHyECwg0NBRvyrouZqWD6x0WD4xAsqsfYQGrEvS%2BEg%3D&reserved=0 >> [2] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111771.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475389496&sdata=M8cQSmQhWKlUVKvFgux0T0TWAnJhPxdO4Dn3fkReTyg%3D&reserved=0 >> [3] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F114559.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&sdata=ZVm5god7n1i07OCc5Z7B%2BBUpnjXCraJXU0FeF5KkCRc%3D&reserved=0 >> [4] https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-October%2F115469.html&data=02%7C01%7Cyskoh%40mellanox.com%7C6c81080cb68340d2128c08d6ed742746%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636957480475394493&sdata=XgKV%2B331Vqsq9Ns40giI1nAwscVxBxqb78vB1BY8z%2Bc%3D&reserved=0