From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id DEE9AA0353; Tue, 5 Nov 2019 11:09:04 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 50A7F2BC8; Tue, 5 Nov 2019 11:09:04 +0100 (CET) Received: from mail-wr1-f67.google.com (mail-wr1-f67.google.com [209.85.221.67]) by dpdk.org (Postfix) with ESMTP id 9B9602BA2 for ; Tue, 5 Nov 2019 11:09:03 +0100 (CET) Received: by mail-wr1-f67.google.com with SMTP id b3so14842766wrs.13 for ; Tue, 05 Nov 2019 02:09:03 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:content-transfer-encoding:user-agent:mime-version; bh=816yG1DYVw3eqxaa33NDnBAPyFD9lWuS+ewx+oh8XnU=; b=gMr4UlPZVNn598m7/tveAQiEbCLJtq7DLeXMPHOz0SVsmg8W4+fZlvoFX7E+rjA+CO jojCfGl881Y4DsarrjOGCmrB9JwhnQ9lansYXLlHUU4XKfF+P82vgojS7jFewpE2TYtV 1L+naW5cA9FyatqggPninBt5TaflBEzwpGhNMRAlXxobyeiUbNDf13h6OA+Rq9NRaJcJ h2n+KPUDDBEbkfiUuXY0uQMt83VWIkHQFZOPmEfasPy6QFaV9w57ln0ayosossBM5R7S aPNtlH8jGgPZQP008AS7ME8JXQ5/02t3WBh8JARVQYDAurxx76yV7xGmgO+9F73iWbLv 1/aw== X-Gm-Message-State: APjAAAWvZMWlkolPjAl25mvdLMsjjRDyYiZFtdRxgMMzpx9mJ42RGcJ+ i5dk8JPjSpwYlh7JyjY7Urg= X-Google-Smtp-Source: APXvYqykOAbMmyKzs7gifpSYV5yq0iOaSiZ4pFVXhEWwZvzXOoYzQ36Gi5m/8+u7lZSrxxHzi2YAxQ== X-Received: by 2002:a5d:51c2:: with SMTP id n2mr26736287wrv.149.1572948542921; Tue, 05 Nov 2019 02:09:02 -0800 (PST) Received: from localhost ([88.98.246.218]) by smtp.gmail.com with ESMTPSA id 6sm27835495wmd.36.2019.11.05.02.09.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Nov 2019 02:09:01 -0800 (PST) Message-ID: <585584a088a1ff95cb13e0fbf58d7bf70777217c.camel@debian.org> From: Luca Boccassi To: Bruce Richardson Cc: Thomas Monjalon , dev@dpdk.org, Christian Ehrhardt , Jerin Jacob Kollanukkaran , Vamsi Krishna Attunuru , arybchenko@solarflare.com, ferruh.yigit@intel.com, maxime.coquelin@redhat.com, Stephen Hemminger , Alex Williamson , david.marchand@redhat.com, ktraynor@redhat.com, anatoly.burakov@intel.com, konstantin.ananyev@intel.com, honnappa.nagarahalli@arm.com, Liang-Min Wang , Alexander Duyck , Peter Xu , Eric Auger Date: Tue, 05 Nov 2019 10:09:01 +0000 In-Reply-To: <20191104111610.GC1356@bricha3-MOBL.ger.corp.intel.com> References: <20190906091230.13923-1-vattunuru@marvell.com> <1612178.XsdEgM4R2a@xps> <1659615.GCIDYkGxRJ@xps> <7c5ea87954223c075529515e4ff20d9036899d02.camel@debian.org> <20191104111610.GC1356@bricha3-MOBL.ger.corp.intel.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.30.5-1.1 MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v1 1/1] kernel/linux: introduce vfio_pf kernel module X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, 2019-11-04 at 11:16 +0000, Bruce Richardson wrote: > On Fri, Nov 01, 2019 at 11:54:45AM +0000, Luca Boccassi wrote: > > For distros, out-of-tree kernel modules are painful. From my POV, > > it > > would be preferable to try and find a solution upstream, even if it > > is > > going to be difficult and require a lot of negotiation and work. > >=20 >=20 > I don't think anyone would disagree that getting an up-stream in- > kernel > solution is the desired end-state. However, even if we accept that, > it > doesn't necessarily help us as we need to decide on this support on > DPDK > right now. The factors to take into account are: >=20 > * we don't have definite line-of-sight to an in-kernel solution - it > may > never come > * even if it does eventually materialize, it will be months before it > is in > a released kernel, more months before it makes it to stable/LTS > distros, > and more months and years thereafter before it actually makes it > into > deployed systems from users. Having an out-of-tree module in DPDK > makes > it available to users much, much sooner. > * there seems to be a real need for this support. >=20 > For me, the key point seems to be the last one - if the feature is > needed > and likely to be used by a reasonable number of users (i.e. not just > 1 or 2). > If it is needed, then we need to have a path to support it, and, > right now, > I'm not seeing any such path other than having support in an out-of- > tree > module in DPDK itself. >=20 > My 2c. >=20 > /Bruce That's a fair point - what I'd like to see is, if as a project we want to move toward having no out-of-tree modules, is a firm statement, a concrete plan and commitment via roadmap to keep working to provide a solution upstream, however hard it might be, and not just declare job done and move on once the oot module is accepted. At that point having a module in the interim is acceptable for me. > > On Thu, 2019-10-31 at 18:03 +0100, Thomas Monjalon wrote: > > > We don't get enough attention on this topic. > > > Let me rephrase the issue and the proposals with more people > > > Cc'ed. > > >=20 > > > We are talking about SR-IOV VFs in VMs > > > with a PF managed on the host by DPDK. > > > The PF driver is either a (1) bifurcated (Mellanox case), > > > or (2) bound to UIO with igb_uio, or (3) bound to VFIO. > > >=20 > > > In case 1, the PF is still managed by a kernel driver, so no > > > issue. > > >=20 > > > In case 2, the PF is managed by UIO. > > > There is no SR-IOV support in upstream UIO, > > > but the out-of-tree module igb_uio works. > > > However we would like to drop this legacy module from DPDK. > > > Some (most) Linux distributions do not package igb_uio anyway. > > > The other issue is that igb_uio is using physical addressing, > > > which is not acceptable with OCTEON TX2 for performance reason. > > >=20 > > > In case 3, the PF is managed by VFIO. This is the case we want to > > > fix. > > > VFIO does not allow to create VFs. > > > The workaround is to create VFs before binding the PF to VFIO. > > > But since Linux 4.19, VFIO forbids any SR-IOV VF management. > > > There is a security concern about allowing userspace to manage > > > SR-IOV > > > VF messages and taking the responsibility for VFs in the guest. > > >=20 > > > It is desired to allow the system admin deciding the security > > > levels, > > > by adding a flag in VFIO "let me manage VFs, I know what I am > > > doing". > > > Reference of "recent" discussion:=20 > > > https://lkml.org/lkml/2018/3/6/855 > > >=20 > > >=20 > > > For now, there is no upstream solution merged. > > >=20 > > > This patch is proposing a solution using an out-of-tree module. > > > In this case, the admin will decide explicitly to bind the PF to > > > vfio_pf. > > > Unfortunately this solution won't work in environments which > > > forbid any out-of-tree module. > > > Another concern is that it looks like DPDK-only solution. > > >=20 > > > We have an issue but we do not want to propose a half-solution > > > which would harm other projects and users. > > > So the question is: > > > Do we accept this patch as a temporary solution? > > > Or can we get an agreement soon for an upstream kernel solution? > > >=20 > > > Thanks for reading and giving your (clear) opinion. > > >=20 > > >=20 > > > 06/09/2019 15:27, Jerin Jacob Kollanukkaran: > > > > From: Thomas Monjalon < > > > > thomas@monjalon.net > > > >=20 > > > > > 06/09/2019 11:12,=20 > > > > > vattunuru@marvell.com > > > > >=20 > > > > > : > > > > > > From: Vamsi Attunuru < > > > > > > vattunuru@marvell.com > > > > > >=20 > > > > > >=20 > > > > > > The DPDK use case such as VF representer or OVS offload etc > > > > > > would call > > > > > > for PF and VF PCIe devices to bind vfio-pci module to > > > > > > enable > > > > > > IOMMU > > > > > > protection. > > > > > >=20 > > > > > > In addition to vSwitch use case, unlike, other PCI class of > > > > > > devices, > > > > > > Network class of PCIe devices would have additional > > > > > > responsibility on > > > > > > the PF devices such as promiscuous mode support etc. > > > > > >=20 > > > > > > The above use cases demand VFIO needs bound to PF and its > > > > > > VF > > > > > > devices. > > > > > > This is use case is not supported in Linux kernel, due to a > > > > > > security > > > > > > issue where it is possible to have DoS in case if VF > > > > > > attached > > > > > > to guest > > > > > > over vfio-pci and netdev kernel driver runs on it and which > > > > > > something > > > > > > VF representer would like to enable it. > > > > > >=20 > > > > > > Since we can not differentiate, the vfio-pci bounded VF > > > > > > devices > > > > > > runs > > > > > > DPDK application or netdev driver in guest, we can not > > > > > > introduce any > > > > > > scheme to fix DoS case and therefore not have proper > > > > > > support of > > > > > > this > > > > > > in the upstream kernel. > > > > > >=20 > > > > > > The igb_uio enables such PF and VF binding support for non- > > > > > > iommu > > > > > > devices to make VF representer or OVS offload run on non- > > > > > > iommu > > > > > > devices > > > > > > with DoS vulnerability for netdev driver as VF. > > > > > >=20 > > > > > > This kernel module, facilitate to enable SRIOV on PF > > > > > > devices, > > > > > > therefore, to run both PF and VF devices in VFIO mode > > > > > > knowing > > > > > > its > > > > > > impacts like igb_uio driver functions of non-iommu devices. > > > > > >=20 > > > > > > Signed-off-by: Vamsi Attunuru < > > > > > > vattunuru@marvell.com > > > > > >=20 > > > > > >=20 > > > > > > Signed-off-by: Jerin Jacob < > > > > > > jerinj@marvell.com > > > > > >=20 > > > > >=20 > > > > > Sorry I fail to properly understand the explanation above. > > > > > Please try to split in shorter sentences. > > > > >=20 > > > > > About the request to add an out-of-tree Linux kernel driver, > > > > > I > > > > > guess Jerin is well > > > > > aware that we don't want such anymore. > > > >=20 > > > > Yes. I am aware of it. I don't like the out of tree modules > > > > either. > > > > But, This case, > > > > I suggested Vamsi to have out of tree module. > > > >=20 > > > > Let me describe the issue and let us discuss how to tackle > > > > the problem: > > > >=20 > > > > # Linux kernel wont allow VFIO PF to have SRIOV enable. > > > >=20 > > > > Patches and on going discussion are here: > > > > https://patchwork.kernel.org/patch/10522381/ > > > >=20 > > > >=20 > > > > https://lwn.net/Articles/748526/ > > > >=20 > > > >=20 > > > >=20 > > > > Based on my understanding the reason for NOT allowing the > > > > VFIO PF to have SRIOV enable is genuine from kernel point of > > > > View but not from DPDK point of view. > > > >=20 > > > > Here is the sequence to describe the problem > > > > 1) Consider Linux kernel allowed VFIO PCI SRIOV enable > > > > 2) PF bound to vfio-pci > > > > 3) using SRIOV infrastructure of vfio-pci PF driver, > > > > VFs are created > > > > 4) DPDK application bound to PF and VF, No issue here. > > > > 5) Assume DPDK application bound to PF and VF bound > > > > To netdev kernel driver. Now, there is a genuine concern > > > > From kernel point of view that, DPDK PF can intercept, > > > > VF mailbox message or so and deny the Kernel request > > > > Or what if DPDK PF application crashes? > > > >=20 > > > > To avoid the case (5), (3) is not allowed in stock kernel. > > > > Which makes sense IMO. > > > >=20 > > > > Now, From DPDK PoV, step 5 is valid as we have > > > > Rte_flow's VF action etc used to enable such case. > > > > Where, user can program the PF's rte_flow to steer > > > > Some traffic to VF, where VF can be, DPDK application or > > > > Linux kernel netdev driver. > > > >=20 > > > > This patch enables the step (3) to enable step (5) from DPDK > > > > PoV. i.e DPDK needs to allow PF to bind to DPDK with VFs. > > > >=20 > > > > Why this issue now: > > > > - igb_uio kernel driver is used as enabling step (3) > > > > See store_max_vfs() kernel/linux/igb_uio/igb_uio.c > > > > This is fine for non-iommu device, IOMMU devices > > > > needs VFIO. > > > > - We would like support VFIO for IOMMU protection > > > > And enable step (5) as DPDK supports form the spec level. > > > > i.e need to fix feature disparity between iommu vs > > > > non-iommu based devices. > > > >=20 > > > > Note: > > > > We may not need a brand new kernel module, we could move > > > > this logic to igb_uio if maintenance is concern. > > >=20 > > >=20 > > >=20 > > >=20 > >=20 > > --=20 > > Kind regards, > > Luca Boccassi --=20 Kind regards, Luca Boccassi