From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 75243A0524; Wed, 2 Jun 2021 13:57:53 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 024114069F; Wed, 2 Jun 2021 13:57:53 +0200 (CEST) Received: from shelob.oktetlabs.ru (shelob.oktetlabs.ru [91.220.146.113]) by mails.dpdk.org (Postfix) with ESMTP id D1B9940689 for ; Wed, 2 Jun 2021 13:57:51 +0200 (CEST) Received: from [192.168.38.17] (aros.oktetlabs.ru [192.168.38.17]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by shelob.oktetlabs.ru (Postfix) with ESMTPSA id 339F37F519; Wed, 2 Jun 2021 14:57:51 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 shelob.oktetlabs.ru 339F37F519 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=oktetlabs.ru; s=default; t=1622635071; bh=4Eyy+PMktioDoStXdWd+AzsqGD9bkGBrP2AvFfLvOdo=; h=Subject:To:Cc:References:From:Date:In-Reply-To; b=j/onQjOqUTUe3vczHjpPC8rPnwJptZEI8O/0Ekod/9nCmMvwiYaoQdcfqVJtqbMX1 0W4eo2r66jgohjgGUNny5IgziY4HgU4DJF9uXKJ0Q6WbNp1+Bzq5UHEmgmFIixFKzs FWDTW+0oVzZsjB0x9LHXa0b/5Y+qJK4tS++UOOYQ= To: Eli Britstein , Ilya Maximets , Ivan Malov , dev@dpdk.org Cc: Smadar Fuks , Hyong Youb Kim , Kishore Padmanabha , Ori Kam , Ajit Khaparde , Jerin Jacob , John Daley , Thomas Monjalon , Ferruh Yigit References: <20210601111420.5549-1-ivan.malov@oktetlabs.ru> <8c4f559e-3430-e2d5-1199-f1d4f4a8546d@ovn.org> <9ed06b56-26e1-5812-e357-81147e699b3b@nvidia.com> <11ed17c8-a3f4-3fcb-b11f-7c4ee903b991@oktetlabs.ru> <4e53dced-2e88-3fde-f2b7-cb2e1368c1c8@nvidia.com> <4447be7e-ce2a-a7a0-68e3-1de22a46cf80@oktetlabs.ru> From: Andrew Rybchenko Organization: OKTET Labs Message-ID: <781987b3-16b3-04b4-838b-c9ac0405d8fe@oktetlabs.ru> Date: Wed, 2 Jun 2021 14:57:51 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [RFC PATCH] ethdev: clarify flow action PORT ID semantics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 6/2/21 2:21 PM, Eli Britstein wrote: > > On 6/2/2021 1:50 PM, Andrew Rybchenko wrote: >> External email: Use caution opening links or attachments >> >> >> On 6/2/21 12:57 PM, Eli Britstein wrote: >>> On 6/1/2021 5:53 PM, Andrew Rybchenko wrote: >>>> External email: Use caution opening links or attachments >>>> >>>> >>>> On 6/1/21 5:44 PM, Eli Britstein wrote: >>>>> On 6/1/2021 5:35 PM, Andrew Rybchenko wrote: >>>>>> External email: Use caution opening links or attachments >>>>>> >>>>>> >>>>>> On 6/1/21 4:24 PM, Eli Britstein wrote: >>>>>>> On 6/1/2021 3:10 PM, Ilya Maximets wrote: >>>>>>>> External email: Use caution opening links or attachments >>>>>>>> >>>>>>>> >>>>>>>> On 6/1/21 1:14 PM, Ivan Malov wrote: >>>>>>>>> By its very name, action PORT_ID means that packets hit an ethdev >>>>>>>>> with the >>>>>>>>> given DPDK port ID. At least the current comments don't state the >>>>>>>>> opposite. >>>>>>>>> That said, since port representors had been adopted, applications >>>>>>>>> like OvS >>>>>>>>> have been misusing the action. They misread its purpose as sending >>>>>>>>> packets >>>>>>>>> to the opposite end of the "wire" plugged to the given ethdev, for >>>>>>>>> example, >>>>>>>>> redirecting packets to the VF itself rather than to its >>>>>>>>> representor >>>>>>>>> ethdev. >>>>>>>>> Another example: OvS relies on this action with the admin PF's >>>>>>>>> ethdev >>>>>>>>> port >>>>>>>>> ID specified in it in order to send offloaded packets to the >>>>>>>>> physical >>>>>>>>> port. >>>>>>>>> >>>>>>>>> Since there might be applications which use this action in its >>>>>>>>> valid >>>>>>>>> sense, >>>>>>>>> one can't just change the documentation to greenlight the opposite >>>>>>>>> meaning. >>>>>>>>> This patch adds an explicit bit to the action configuration which >>>>>>>>> will let >>>>>>>>> applications, depending on their needs, leverage the two meanings >>>>>>>>> properly. >>>>>>>>> Applications like OvS, as well as PMDs, will have to be corrected >>>>>>>>> when the >>>>>>>>> patch has been applied. But the improved clarity of the action is >>>>>>>>> worth it. >>>>>>>>> >>>>>>>>> The proposed change is not the only option. One could avoid >>>>>>>>> changes >>>>>>>>> in OvS >>>>>>>>> and PMDs if the new configuration field had the opposite meaning, >>>>>>>>> with the >>>>>>>>> action itself meaning delivery to the represented port and not to >>>>>>>>> DPDK one. >>>>>>>>> Alternatively, one could define a brand new action with the said >>>>>>>>> behaviour. >>>>>>> It doesn't make any sense to attach the VF itself to OVS, but >>>>>>> only its >>>>>>> representor. >>>>>> OvS is not the only DPDK application. >>>>> True. It is just the focus of this commit message is OVS. >>>>>>> For the PF, when in switchdev mode, it is the "uplink >>>>>>> representor", so >>>>>>> it is also a representor. >>>>>> Strictly speaking it is not a representor from DPDK point of >>>>>> view. E.g. representors have corresponding flag set which is >>>>>> definitely clear in the case of PF. >>>>> This is the per-PMD responsibility. The API should not care. >>>>>>> That said, OVS does not care of the type of the port. It doesn't >>>>>>> matter >>>>>>> if it's an "upstream" or not, or if it's a representor or not. >>>>>> Yes, it is clear, but let's put OvS aside. Let's consider a >>>>>> DPDK application which has a number of ethdev port. Some may >>>>>> belong to single switch domain, some may be from different >>>>>> switch domains (i.e. different NICs). Can I use PORT_ID action >>>>>> to redirect ingress traffic to a specified ethdev port using >>>>>> PORT_ID action? It looks like no, but IMHO it is the definition >>>>>> of the PORT_ID action. >>>>> Let's separate API from implementation. By API point of view, yes, the >>>>> user may request it. Nothing wrong with it. >>>>> >>>>>   From implementation point of view - yes, it might fail, but not for >>>>> sure, even if on different NICs. Maybe the HW of a certain vendor has >>>>> the capability to do it? >>>>> >>>>> We can't know, so I think the API should allow it. >>>> Hold on. What should it allow? It is two opposite meanings: >>>>    1. Direct traffic to DPDK ethdev port specified using ID to be >>>>       received and processed by the DPDK application. >>>>    2. Direct traffic to an upstream port represented by the >>>>       DPDK port. >>>> >>>> The patch tries to address the ambiguity, misuse it in OvS >>>> (from my point of view in accordance with the action >>>> documentation), mis-implementation in a number of PMDs >>>> (to work in OvS) and tries to sort it out with an explanation >>>> why proposed direction is chosen. I realize that it could be >>>> painful, but IMHO it is the best option here. Yes, it is a >>>> point to discuss. >>>> >>>> To start with we should agree that that problem exists. >>>> Second, we should agree on direction how to solve it. >>> I agree. Suppose port 0 is the PF, and port 1 is a VF representor. >>> >>> IIUC, there are two options: >>> >>> 1. flow create 1 ingress transfer pattern eth / end action port_id id 0 >>> upstream 1 / end >>> >>> 2. flow create 1 ingress transfer pattern eth / end action port_id id 0 >>> upstream 0 / end >>> >>> [1] is the same behavior as today. >>> >>> [2] is a new behavior, the packet received by port 0 as if it arrived >>> from the wire. >>> >>> Then, let's have more: >>> >>> 3. flow create 0 ingress transfer pattern eth / end action port_id id 1 >>> upstream 1 / end >>> >>> 4. flow create 0 ingress transfer pattern eth / end action port_id id 1 >>> upstream 0 / end >>> >>> if we have [2] and [4], the packet going from the VF will hit [2], then >>> hit [4] and then [2] again in an endless loop? >> As I understand PORT_ID is a fate action. So, no more lookups >> are done. If the packet is loop back from applications, loop is >> possible. > > I referred a HW loop, not SW. For example with JUMP action (also fate): > > flow create 0 group 0 ingress transfer pattern eth / end action jump > group 1 / end > > flow create 0 group 1 ingress transfer pattern eth / end action jump > group 0 / end IMHO jump is an internal fate action. Fate action for a corresponding group. But PORT_ID is a real fate action. So, no more lookups are done at all. So, no HW loop. >> >> In fact, it is a good question if "flow creare 0 ingress >> transfer" or "flow create 1 ingress transfer" assume any >> implicit filtering. I always thought that no. >> i.e. if we have two network ports rule like >>    flow create 0 ingress transfer pattern eth / end \ >>         action port_id id 1 upstream 1 / end >> will match packets incoming from any port into the switch >> (network port 0, network port 1, VF or PF itself (???)). >> The topic also requires explicit clarification. > rte_flow is port based. It implicitly filters only packets for the > provided port (0). As I've written below I disagree. If I'm missing some documentation about it, please, help to find it. Otherwise, it is open question and a point to discuss. > Maybe need to clarify documentation and have a "no filtering" API if > needed. > >> PF itself is really a hard question because of "ingress" >> since traffic from PF is a traffic from DPDK application and >> it is egress, not ingress. > > Ingress means the direction. Hit on packets otherwise provided to the SW > by rte_eth_rx_burst(). > > Same goes for the PF. Packets by rte_eth_rx_burst are the ones arriving > from the wire, so ingress is that direction and egress is from the app. Yes, of course. But I'm talking about no implicit filtering case. If so, ideally it should not match packets arriving into a transfer switch from PF (sent using rte_eth_tx_burst()). I.e. here I'm talking about implicit filtering specified using ingress or egress direction. >> >> I think that port ID used to created flow rule should not >> apply any filtering in the case of transfer since we have >> corresponding items to do it explicitly. If we do it implicitly >> as well, we need some priorities and a way to avoid implicit >> rules which makes things much harder to understand and >> implement. > > If "upstream 0" means what I thought it means (comments?) maybe a better > way to do it is expose another port for that, so there will be 2 "PF" > ports - one as the wire representor and the other one as the "PF" (or > clearer naming...). Yes, "upstream 0" means that the packet should be delivered to DPDK port specified by ID to be received using rte_eth_rx_burst(). Solution with network port representors is possible, but IMHO it just adds complexity. If an application wants to route some traffic to itself for further processing, it will require extra efforts and signaling in order to provide ingress port information. If such traffic is delivered via corresponding representor, the information is provided in a native way. > This would be a vendor decision, and there would be no need to change > PORT_ID API. Sorry, I still think that PORT_ID action behaviour assumed and used by OvS in the case of PF is wrong and mismatch action definition. Representors case is questionable and really related to representor definition discussion pointed out by Ilya. >> >>> If this is your meaning, maybe what you are looking for is an action to >>> change the in_port and continue processing? >>> >>> Please comment on the examples I gave or clarify the use case you are >>> trying to do. >>> >>> >>> Thanks, >>> >>> Eli >>> >>>>>>>> We had already very similar discussions regarding the >>>>>>>> understanding of >>>>>>>> what >>>>>>>> the representor really is from the DPDK API's point of view, and >>>>>>>> the >>>>>>>> last >>>>>>>> time, IIUC, it was concluded by a tech. board that representor >>>>>>>> should be >>>>>>>> a "ghost of a VF", i.e. DPDK APIs should apply configuration by >>>>>>>> default to >>>>>>>> VF and not to the representor device: >>>>>>>> >>>>>>>> https://patches.dpdk.org/project/dpdk/cover/20191029185051.32203-1-thomas@monjalon.net/#104376 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> This wasn't enforced though, IIUC, for existing code and >>>>>>>> semantics is >>>>>>>> still mixed. >>>>>>> I am not sure how this is related. >>>>>>>> I still think that configuration should be applied to VF, and the >>>>>>>> same >>>>>>>> applies >>>>>>>> to rte_flow API.  IMHO, average application should not care if >>>>>>>> device is >>>>>>>> a VF itself or its representor.  Everything should work exactly the >>>>>>>> same. >>>>>>>> I think this matches with the original idea/design of the switchdev >>>>>>>> functionality >>>>>>>> in the linux kernel and also matches with how the average user >>>>>>>> thinks >>>>>>>> about >>>>>>>> representor devices. >>>>>>> Right. This is the way representors work. It is fully aligned with >>>>>>> configuration of OVS-kernel. >>>>>>>> If some specific use-case requires to distinguish VF from the >>>>>>>> representor, >>>>>>>> there should probably be a separate special API/flag for that. >>>>>>>> >>>>>>>> Best regards, Ilya Maximets.