DPDK patches and discussions
 help / color / mirror / Atom feed
From: Luca Boccassi <bluca@debian.org>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: Thomas Monjalon <thomas@monjalon.net>,
	dev@dpdk.org,
	Christian Ehrhardt <christian.ehrhardt@canonical.com>,
	Jerin Jacob Kollanukkaran <jerinj@marvell.com>,
	Vamsi Krishna Attunuru <vattunuru@marvell.com>,
	 arybchenko@solarflare.com, ferruh.yigit@intel.com,
	maxime.coquelin@redhat.com,
	 Stephen Hemminger <stephen@networkplumber.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	david.marchand@redhat.com, ktraynor@redhat.com,
	anatoly.burakov@intel.com, konstantin.ananyev@intel.com,
	honnappa.nagarahalli@arm.com,
	Liang-Min Wang <liang-min.wang@intel.com>,
	Alexander Duyck <alexander.h.duyck@intel.com>,
	 Peter Xu <peterx@redhat.com>, Eric Auger <eric.auger@redhat.com>
Subject: Re: [dpdk-dev] [PATCH v1 1/1] kernel/linux: introduce vfio_pf kernel module
Date: Tue, 05 Nov 2019 10:09:01 +0000	[thread overview]
Message-ID: <585584a088a1ff95cb13e0fbf58d7bf70777217c.camel@debian.org> (raw)
In-Reply-To: <20191104111610.GC1356@bricha3-MOBL.ger.corp.intel.com>

On Mon, 2019-11-04 at 11:16 +0000, Bruce Richardson wrote:
> On Fri, Nov 01, 2019 at 11:54:45AM +0000, Luca Boccassi wrote:
> > For distros, out-of-tree kernel modules are painful. From my POV,
> > it
> > would be preferable to try and find a solution upstream, even if it
> > is
> > going to be difficult and require a lot of negotiation and work.
> > 
> 
> I don't think anyone would disagree that getting an up-stream in-
> kernel
> solution is the desired end-state. However, even if we accept that,
> it
> doesn't necessarily help us as we need to decide on this support on
> DPDK
> right now. The factors to take into account are:
> 
> * we don't have definite line-of-sight to an in-kernel solution - it
> may
>   never come
> * even if it does eventually materialize, it will be months before it
> is in
>   a released kernel, more months before it makes it to stable/LTS
> distros,
>   and more months and years thereafter before it actually makes it
> into
>   deployed systems from users. Having an out-of-tree module in DPDK
> makes
>   it available to users much, much sooner.
> * there seems to be a real need for this support.
> 
> For me, the key point seems to be the last one - if the feature is
> needed
> and likely to be used by a reasonable number of users (i.e. not just
> 1 or 2).
> If it is needed, then we need to have a path to support it, and,
> right now,
> I'm not seeing any such path other than having support in an out-of-
> tree
> module in DPDK itself.
> 
> My 2c.
> 
> /Bruce

That's a fair point - what I'd like to see is, if as a project we want
to move toward having no out-of-tree modules, is a firm statement, a
concrete plan and commitment via roadmap to keep working to provide a
solution upstream, however hard it might be, and not just declare job
done and move on once the oot module is accepted. At that point having
a module in the interim is acceptable for me.

> > On Thu, 2019-10-31 at 18:03 +0100, Thomas Monjalon wrote:
> > > We don't get enough attention on this topic.
> > > Let me rephrase the issue and the proposals with more people
> > > Cc'ed.
> > > 
> > > We are talking about SR-IOV VFs in VMs
> > > with a PF managed on the host by DPDK.
> > > The PF driver is either a (1) bifurcated (Mellanox case),
> > > or (2) bound to UIO with igb_uio, or (3) bound to VFIO.
> > > 
> > > In case 1, the PF is still managed by a kernel driver, so no
> > > issue.
> > > 
> > > In case 2, the PF is managed by UIO.
> > > There is no SR-IOV support in upstream UIO,
> > > but the out-of-tree module igb_uio works.
> > > However we would like to drop this legacy module from DPDK.
> > > Some (most) Linux distributions do not package igb_uio anyway.
> > > The other issue is that igb_uio is using physical addressing,
> > > which is not acceptable with OCTEON TX2 for performance reason.
> > > 
> > > In case 3, the PF is managed by VFIO. This is the case we want to
> > > fix.
> > > VFIO does not allow to create VFs.
> > > The workaround is to create VFs before binding the PF to VFIO.
> > > But since Linux 4.19, VFIO forbids any SR-IOV VF management.
> > > There is a security concern about allowing userspace to manage
> > > SR-IOV
> > > VF messages and taking the responsibility for VFs in the guest.
> > > 
> > > It is desired to allow the system admin deciding the security
> > > levels,
> > > by adding a flag in VFIO "let me manage VFs, I know what I am
> > > doing".
> > > Reference of "recent" discussion: 
> > > https://lkml.org/lkml/2018/3/6/855
> > > 
> > > 
> > > For now, there is no upstream solution merged.
> > > 
> > > This patch is proposing a solution using an out-of-tree module.
> > > In this case, the admin will decide explicitly to bind the PF to
> > > vfio_pf.
> > > Unfortunately this solution won't work in environments which
> > > forbid any out-of-tree module.
> > > Another concern is that it looks like DPDK-only solution.
> > > 
> > > We have an issue but we do not want to propose a half-solution
> > > which would harm other projects and users.
> > > So the question is:
> > > Do we accept this patch as a temporary solution?
> > > Or can we get an agreement soon for an upstream kernel solution?
> > > 
> > > Thanks for reading and giving your (clear) opinion.
> > > 
> > > 
> > > 06/09/2019 15:27, Jerin Jacob Kollanukkaran:
> > > > From: Thomas Monjalon <
> > > > thomas@monjalon.net
> > > > 
> > > > > 06/09/2019 11:12, 
> > > > > vattunuru@marvell.com
> > > > > 
> > > > > :
> > > > > > From: Vamsi Attunuru <
> > > > > > vattunuru@marvell.com
> > > > > > 
> > > > > > 
> > > > > > The DPDK use case such as VF representer or OVS offload etc
> > > > > > would call
> > > > > > for PF and VF PCIe devices to bind vfio-pci module to
> > > > > > enable
> > > > > > IOMMU
> > > > > > protection.
> > > > > > 
> > > > > > In addition to vSwitch use case, unlike, other PCI class of
> > > > > > devices,
> > > > > > Network class of PCIe devices would have additional
> > > > > > responsibility on
> > > > > > the PF devices such as promiscuous mode support etc.
> > > > > > 
> > > > > > The above use cases demand VFIO needs bound to PF and its
> > > > > > VF
> > > > > > devices.
> > > > > > This is use case is not supported in Linux kernel, due to a
> > > > > > security
> > > > > > issue where it is possible to have DoS in case if VF
> > > > > > attached
> > > > > > to guest
> > > > > > over vfio-pci and netdev kernel driver runs on it and which
> > > > > > something
> > > > > > VF representer would like to enable it.
> > > > > > 
> > > > > > Since we can not differentiate, the vfio-pci bounded VF
> > > > > > devices
> > > > > > runs
> > > > > > DPDK application or netdev driver in guest, we can not
> > > > > > introduce any
> > > > > > scheme to fix DoS case and therefore not have proper
> > > > > > support of
> > > > > > this
> > > > > > in the upstream kernel.
> > > > > > 
> > > > > > The igb_uio enables such PF and VF binding support for non-
> > > > > > iommu
> > > > > > devices to make VF representer or OVS offload run on non-
> > > > > > iommu
> > > > > > devices
> > > > > > with DoS vulnerability for netdev driver as VF.
> > > > > > 
> > > > > > This kernel module, facilitate to enable SRIOV on PF
> > > > > > devices,
> > > > > > therefore, to run both PF and VF devices in VFIO mode
> > > > > > knowing
> > > > > > its
> > > > > > impacts like igb_uio driver functions of non-iommu devices.
> > > > > > 
> > > > > > Signed-off-by: Vamsi Attunuru <
> > > > > > vattunuru@marvell.com
> > > > > > 
> > > > > > 
> > > > > > Signed-off-by: Jerin Jacob <
> > > > > > jerinj@marvell.com
> > > > > > 
> > > > > 
> > > > > Sorry I fail to properly understand the explanation above.
> > > > > Please try to split in shorter sentences.
> > > > > 
> > > > > About the request to add an out-of-tree Linux kernel driver,
> > > > > I
> > > > > guess Jerin is well
> > > > > aware that we don't want such anymore.
> > > > 
> > > > Yes. I am aware of it. I don't like the out of tree modules
> > > > either.
> > > > But, This case,
> > > > I suggested Vamsi to have out of tree module.
> > > > 
> > > > Let me describe the issue and let us discuss how to tackle
> > > > the  problem:
> > > > 
> > > > # Linux kernel wont allow VFIO PF to have SRIOV enable.
> > > > 
> > > > Patches and on going discussion are here:
> > > > https://patchwork.kernel.org/patch/10522381/
> > > > 
> > > > 
> > > > https://lwn.net/Articles/748526/
> > > > 
> > > > 
> > > > 
> > > > Based on my understanding the reason for NOT allowing the
> > > > VFIO PF to have SRIOV enable is genuine from kernel point of
> > > > View but not from DPDK point of view.
> > > > 
> > > > Here is the sequence  to describe the problem
> > > > 1) Consider Linux kernel allowed VFIO PCI SRIOV enable
> > > > 2) PF bound to vfio-pci
> > > > 3) using SRIOV infrastructure of vfio-pci  PF driver,
> > > > VFs  are created
> > > > 4) DPDK application bound to PF and VF, No issue here.
> > > > 5) Assume DPDK application bound to PF and VF bound
> > > > To netdev kernel driver. Now, there is a genuine  concern
> > > > From kernel point of view that, DPDK PF can intercept,
> > > > VF mailbox message or so and deny the Kernel request
> > > > Or what if DPDK PF application crashes?
> > > > 
> > > > To avoid the case (5), (3) is not allowed in stock kernel.
> > > > Which makes sense IMO.
> > > > 
> > > > Now, From DPDK PoV, step 5 is valid as we have
> > > > Rte_flow's VF action etc used to enable such case.
> > > > Where, user can program the PF's rte_flow to steer
> > > > Some traffic to VF, where VF can be, DPDK application or
> > > > Linux kernel netdev driver.
> > > > 
> > > > This patch enables the step (3) to enable step (5) from DPDK
> > > > PoV. i.e DPDK needs to allow PF to bind to DPDK with VFs.
> > > > 
> > > > Why this issue now:
> > > > - igb_uio kernel driver is used as enabling step (3)
> > > > See store_max_vfs() kernel/linux/igb_uio/igb_uio.c
> > > >  This is fine for non-iommu device, IOMMU devices
> > > > needs VFIO.
> > > > - We would like support VFIO for IOMMU protection
> > > > And enable step (5) as DPDK supports form the spec level.
> > > > i.e need to fix feature disparity between iommu vs
> > > > non-iommu based devices.
> > > > 
> > > > Note:
> > > > We may not need a  brand new kernel module, we could move
> > > > this logic to igb_uio if maintenance is concern.
> > > 
> > > 
> > > 
> > > 
> > 
> > -- 
> > Kind regards,
> > Luca Boccassi
-- 
Kind regards,
Luca Boccassi

  reply	other threads:[~2019-11-05 10:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06  9:12 vattunuru
2019-09-06  9:45 ` Thomas Monjalon
2019-09-06 13:27   ` Jerin Jacob Kollanukkaran
2019-09-25  4:06     ` Vamsi Krishna Attunuru
2019-09-25  7:18       ` Andrew Rybchenko
2019-10-08  5:07     ` Vamsi Krishna Attunuru
2019-10-31 17:03     ` Thomas Monjalon
2019-11-01 11:54       ` Luca Boccassi
2019-11-01 12:12         ` Jerin Jacob
2019-11-04 11:16         ` Bruce Richardson
2019-11-05 10:09           ` Luca Boccassi [this message]
2019-11-06 22:32       ` Alex Williamson
2019-11-07  5:02         ` Jerin Jacob
2019-11-15  6:57           ` Thomas Monjalon
2019-11-15  7:01             ` Jerin Jacob
2019-10-08 15:12 ` Stephen Hemminger
2019-10-08 15:28   ` Jerin Jacob
2019-10-09 23:28     ` Stephen Hemminger
2019-10-10  6:02       ` Jerin Jacob
2019-10-13  7:20         ` Jerin Jacob
2019-10-16 11:37           ` Jerin Jacob
2019-10-23 17:08             ` Jerin Jacob
2019-10-24 11:08       ` Jerin Jacob

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=585584a088a1ff95cb13e0fbf58d7bf70777217c.camel@debian.org \
    --to=bluca@debian.org \
    --cc=alex.williamson@redhat.com \
    --cc=alexander.h.duyck@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=arybchenko@solarflare.com \
    --cc=bruce.richardson@intel.com \
    --cc=christian.ehrhardt@canonical.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=eric.auger@redhat.com \
    --cc=ferruh.yigit@intel.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=jerinj@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=ktraynor@redhat.com \
    --cc=liang-min.wang@intel.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=peterx@redhat.com \
    --cc=stephen@networkplumber.org \
    --cc=thomas@monjalon.net \
    --cc=vattunuru@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).