From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) by dpdk.org (Postfix) with ESMTP id 47C90A495 for ; Mon, 15 Jan 2018 17:09:59 +0100 (CET) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 988F520C5A; Mon, 15 Jan 2018 11:09:58 -0500 (EST) Received: from frontend1 ([10.202.2.160]) by compute1.internal (MEProxy); Mon, 15 Jan 2018 11:09:58 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= cc:content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=mesmtp; bh=h7lxcdRzV0YTnaAs9o1MpC1qEG OEf5EGi4X2l3bsfEw=; b=X5qCZzjgUXK4Si4S8wRWSA8mROHQQV0FwY14L9DgYb SQnJ7tYjQKgkLgbLrda0YCsZlFANCreLHAK6HOlFScvmlUS+ccWeXxmC1t0DxWss wb2ubbB+ArgS5OjQhrYhITdjAFC5Gu4OJUt8Oau476q+yqD3Vpw7CAFkuV4+2n+M Q= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=h7lxcd RzV0YTnaAs9o1MpC1qEGOEf5EGi4X2l3bsfEw=; b=OS8ZqqdSh35kdfPcuv3pSL xPO65aFkYDpmmm/70E1W0uF345hR1hEj+vAFkisWju0Y/kp8imax8LFGRQvHyGqs gT7HYaAu2ZeoY/9pGdr65AswxW0Na8BpT7xt1Opp+Jl4/vQyF1r97UYGsvNQU1G3 P8nZsQdX58s+2RvFPZ7csQl0Uz3QjLfpOj42tzuJdRMmv9Lf9dywR0mudyAOs5kB nlzKQCwzNpFQ9o3+4slHn5qKVfewSRmQ5oK+RXfbOOJ4DEYHuqV/2/VXjVX9FbFH NB4PMjWPopkFBQdtQEWdEYMPINGQSM4SD1+nsmqLAyNkIXjPtKr3VPgvKrfyrE+Q == X-ME-Sender: Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184]) by mail.messagingengine.com (Postfix) with ESMTPA id 29B357E51F; Mon, 15 Jan 2018 11:09:58 -0500 (EST) From: Thomas Monjalon To: "Doherty, Declan" Cc: Remy Horton , Mohammad Abdul Awal , dev@dpdk.org, John McNamara , Wenzhuo Lu , Jingjing Wu , Alejandro Lucero , vincent.jardin@6wind.com, yuanhanl@mellanox.com, alexr@mellanox.com Date: Mon, 15 Jan 2018 17:09:26 +0100 Message-ID: <2161092.oodcdG3sB1@xps> In-Reply-To: References: <20180108143720.7994-1-remy.horton@intel.com> <2447572.KKM6cvjoYG@xps> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v4 0/5] lib: add Port Representors X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 15 Jan 2018 16:09:59 -0000 15/01/2018 13:12, Doherty, Declan: > On 10/01/2018 7:26 PM, Thomas Monjalon wrote: > > 10/01/2018 14:46, Doherty, Declan: > >> On 09/01/2018 11:22 PM, Thomas Monjalon wrote: > >>> Hi, > >>> > >>> 08/01/2018 15:37, Remy Horton: > >>>> Port Representors provide a logical presentation in DPDK of VF (virtual > >>>> function) ports for the purposes of control and monitoring. Each port > >>>> representor device represents a single VF and is associated with it's > >>>> parent physical function (PF) PMD which provides the back-end hooks for > >>>> the representor device ops and defines the control domain to which that > >>>> port belongs. This allows to use existing DPDK APIs to monitor and control > >>>> the port without the need to create and maintain VF specific APIs. > >>> > >>> Extending control plane ability of DPDK has been discussed > >>> multiple times. > >> > >> It has, and I have yet to see a really strong reason as to why we would > >> not support control plane functions within DPDK, many of which are > >> already support today implicitly anyway through our ethdev APIs. > >> > >>> The current agreed policy is: > >>> " > >>> The primary goal of DPDK is to provide a userspace dataplane. > >>> Managing VFs from a PF driver is a control plane feature and developers > >>> should generally rely on the Linux Kernel for that. > >>> " > >>> http://dpdk.org/doc/guides/contributing/design.html#pf-and-vf-considerations > >>> > >> > >> My understanding is that this particular entry was based around the > >> discussion on the divergence of functionality between the Linux kernel > >> PF driver and the DPDK PF driver. I also don't really think the above > >> statement is valid as a blanket statement for the project as it makes > >> the assumption that DPDK is only deployed on Linux hosts, what about > >> FreeBSD? or in the future Windows? > > > > Yes, we must agree on removing this scope limitation while working > > on a generic VF representor. > > > >> A number of presentations at both Userspace in Dublin and the Summit > >> in San Jose discussed the support of control plane functionality by > >> DPDK and there wasn't any strong arguments or opposition against using > >> DPDK for control plane functions that I saw. > >> > >> In any case this patchset is not introducing any new control plane APIs > >> that don't already exist within DPDK today, it only enables the creation > >> of a new type of virtual PMDs which are linked to the same base > >> infrastructure and which can be used to represent VFs in a control plane > >> application as we have implemented in this patch set. > >> > >>> If we relax this policy, I think the representor solution should be > >>> a real port, not only "for the purposes of control and monitoring". > >>> It has been asked several times as replies to this series, > >>> but it is kindly ignored, saying it will be thought later. > >>> > >> > >> I think we have stated in multiple discussions, especially during the > >> userspace presentation back in September that this solution supports > >> data path on the representors PMDs, and we have used the > >> infrastructure proposed here to do exactly what you are asking. As the > >> representor infrastructure doesn't preclude the support of a data > >> path, we have used it as it is presented here to implement a data path > >> for exception path packets for a prototype vswitch offload implementation. > >> > >> > >>> I don't see a general agreement on this series so far. > >>> > >> > >> I think the main issue of contention is that there is a > >> misunderstanding that this implementation only supports control plane > >> management and monitoring, but that is not the case and it can be used > >> for full data path representors, with limited or no control plane > >> functionality if required, at the end of the day the only limitations > >> are based on what is implemented by the backend base driver were the > >> broker is running for the representor ports. > > > > The misunderstanding may originates from what you describe (even in v5): > > "ports for the purposes of control and monitoring" > > > noted, but that is the scope of what we demostrate in the patchset, but > we'll update the introduction to reflect the fact that they can be used > to also support data path functions, such as exception path traffic for > hw switch. > > > I think everybody agree to have VF representors in DPDK. > > But there are few things which are not generic enough, > > and not easy to use. > > I hoped the discussion started at Dublin would continue > > on the mailing list but I realize the joint effort with other vendors > > did not happen. > > I will elaborate quickly below and more detailed in later review. > > > > 1/ In order to keep track of the relations between PF, VF and > > representor - which are all ethdev - you create a struct outside > > of ethdev. I think it should be managed inside ethdev API. > > Initially we had implemented the representor functionality within the > context of the ethdev library but ran into a number of scenarios where > this didn't work well as it makes the assumption that the base device > that the representors are attached to is always an ethdev, we ran into > cases were the PF isn't necessarily an ethdev, for example in some > smartNICs the PF would be better represented by a switchdev, or it is > possible that the device hosting the representor broker could just > provide a conduit to a kernel driver. The base device may be something else than an ethdev PF, so it may be represented as a rte_device. But the VF and representor are still ethdev. I still think the relationship should be described in ethdev. > > As suggested by others, we could also think whether a switchdev API > > is interesting or not. > > Indeed if a switchdev is something that is required by the community it > would make sense that the representor infrastructure was initialized > within the switchdev and not an ethdev. The advantage of keeping the > representor infrastructure independent is that it gives the flexibility > for representors to be supported independently of device type they are > attached to. switchdev is just another device class. We must abstract the base device as a basic EAL rte_device for now. > > 2/ You create a new library for ethdev device management. > > It is the responsibility of the bus to scan and probe devices. > > In Intel case, the representor has no real bus so it must rely on > > the vdev bus. Note that the new custom scan hook may help. > > This isn't the case in latest versions of the patchset, the bus the > representors are dependent on is that of the base device, so for the > i40e it's the PF PCI device. I'm suggesting to use vdev as bus of i40e representor, because there is no PCI identifier for them, right? > > In Mellanox case, the representor already exists in Linux, and is based > > on PCI bus. > > Then, as any other port, it must be managed with whitelist or blacklist. > > I think the suggestion by Yuanhan of using the device whitelist command > option makes sense as a option from the commandline, but it would > require the newly propose implementation which allows specification of > both the bus and device as not all devices are PCI, which have multiple > host ports using SR-IOV, but there are cases when an dynamic > creation/destruction of ports may also need to be supported, which is > what the representor APIs support. The full proposed syntax of device identification is: http://dpdk.org/ml/archives/dev/2017-December/084572.html With this generic syntax, we can describe port representors and request their initialization via whitelisting. [...] > > 3/ You are using PCI address + index to identify the representor. > > It is a no-go. We have made effort to abstract buses. > > As an idea, the identification of a representor could use the new > > proposed flexible device syntax. > > We are currently using net_representor_%bus%_%device_id%_%vport_id% to > identify each representor device but I have no issue changing to either > the current convention which would be net_representor_%unique_id% or if > I understand the proposal in the RFC "ether: standardize getting the > port by name" we would be using something like, > we should be looking at something along the lines of > net_%bus%_%device_id%_%port_id% which is pretty close to what we are > using now. It is introducing yet another syntax. It would be better to align every device identification usages on a common and standard syntax. Then the syntax parsing will be done in bus and libs (e.g. ethdev) with some helper functions which can be re-used for every usages. The latest version of representors is using direct PCI parsing instead of relying on some bus or ethdev helpers. > In terms of that RFC I'm not clear on if the proposal is just to affect > the API for getting a port by name, or actually the name name assigned > to the device itself. It is not about the name. It is a proposal of syntax to describe a resource of a hardware device, or a virtual device. The most obvious usage is to replace -w/-b and --vdev. It can also be used in OVS or any other DPDK configuration. > > 4/ Such new API must be experimental. > > We will address this in the next revision > > > I propose to better think the representor API inside ethdev > > with a good multi-vendor collaboration, > > and submit a deprecation notice for 18.05 integration. > > I would really like to see this included as experimental in 18.02 > release, if it is agreed by the community that we need to re-integate > the representor concept into librte_ethdev during for 18.05 we will > support that work. I don't see the benefit of introducing a lib and a syntax which would be replaced in the next release.