From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 364B3A0C49; Tue, 15 Jun 2021 09:48:30 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E53624067A; Tue, 15 Jun 2021 09:48:29 +0200 (CEST) Received: from new2-smtp.messagingengine.com (new2-smtp.messagingengine.com [66.111.4.224]) by mails.dpdk.org (Postfix) with ESMTP id 3729740140 for ; Tue, 15 Jun 2021 09:48:29 +0200 (CEST) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailnew.nyi.internal (Postfix) with ESMTP id C35BD58075E; Tue, 15 Jun 2021 03:48:28 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute6.internal (MEProxy); Tue, 15 Jun 2021 03:48:28 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm1; bh= qgqB4lhNucNZVBHfMjI9MKWJp7Trk+ivUJ3i5LvJQA4=; b=zH3ikM/13A+a/LK/ 9daogbJPA1vEowBJ424GXj5qukz0VW2tpwmepjSBWjzp9wrpv27VwB/p/f8Lk0Cy wN7gneyoyVsA62/qfZFHGj7UVyhzLYUNHlEJnKzGwe3cK8IekSel9Tk7nXZH+IN6 3OUFBtvHDwG7K6mBKgFsOu+3SKIsfcbwmZRJc0CaA+/vOWas/AnYaOigTjp9jYeg yIvnGIiOfLPfT5xCDR71w1S0uBRgI3aacrOwuIzRZ6gseCOZNGwSPiADA9hGd1yL SjknoMRfMsSIX9APwxb855iWm1ZGb9xDeG2CtJw6zVvddj/OOfLhAPZ1lJeGCfel QZAZew== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=qgqB4lhNucNZVBHfMjI9MKWJp7Trk+ivUJ3i5LvJQ A4=; b=YAFkU4KsbQj69Vk1bQPDSlH39700h9BL3s/JbjCZcmcfyaLzXGjBqDfdX CFeiY/lTJQ9Fkovd/ILrpEvLKuEU853kfMIZ4g4gAlx6XNpeY6y0is0kdQnUqyuT CGUgRYFexa3uykkbwLWFvOvR9NRSkg775qeY8MhfEqq74TziHoIHkJg+jdRk62td fBYaJcDpsml7Y90EbS77xPUTQr7HR/MjRsMUj2Uiz6P/aAWnJlMRyF8VqbEjj4I0 on1OhuACQaerTKSMOedtnEorGWRky8OkTOc19peoiwqwU5fKPYPEaQS6y49AafC4 sbtmr1Eqd5pCLi0LeCzZLPnMW1nqw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrfedviedguddviecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufffkjghfggfgtgesthfuredttddtvdenucfhrhhomhepvfhhohhm rghsucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtqeenuc ggtffrrghtthgvrhhnpeffvdffjeeuteelfeeileduudeugfetjeelveefkeejfeeigeeh teffvdekfeegudenucffohhmrghinhepughpughkrdhorhhgnecuvehluhhsthgvrhfuih iivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepthhhohhmrghssehmohhnjhgrlhho nhdrnhgvth X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 15 Jun 2021 03:48:26 -0400 (EDT) From: Thomas Monjalon To: "Xia, Chenbo" Cc: "dev@dpdk.org" , "Liang, Cunming" , "Wu, Jingjing" , "Burakov, Anatoly" , "Yigit, Ferruh" , "mdr@ashroe.eu" , "nhorman@tuxdriver.com" , "Richardson, Bruce" , "david.marchand@redhat.com" , "stephen@networkplumber.org" , "Ananyev, Konstantin" , jgg@nvidia.com, parav@nvidia.com, xuemingl@nvidia.com Date: Tue, 15 Jun 2021 09:48:24 +0200 Message-ID: <50744230.0ZSezZt4d8@thomas> In-Reply-To: References: <20190715075214.16616-6-tiwei.bie@intel.com> <5205443.cqaiBGeHSM@thomas> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [RFC v3 0/6] Add mdev (Mediated device) support in DPDK X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 15/06/2021 04:49, Xia, Chenbo: > From: Thomas Monjalon > > 01/06/2021 05:06, Chenbo Xia: > > > Hi everyone, > > > > > > This is a draft implementation of the mdev (Mediated device [1]) > > > support in DPDK PCI bus driver. Mdev is a way to virtualize devices > > > in Linux kernel. Based on the device-api (mdev_type/device_api), > > > there could be different types of mdev devices (e.g. vfio-pci). > > > > Please could you illustrate with an usage of mdev in DPDK? > > What does it enable which is not possible today? > > The main purpose is for DPDK to drive mdev-based devices, which is not > possible today. > > I'd take PCI devices for an example. Currently DPDK can only drive devices > of physical pci bus under /sys/bus/pci and kernel exposes the pci devices > to APP in that way. > > But there are PCI devices using vfio-mdev as a software framework to expose > Mdev to APP under /sys/bus/mdev. Devices could choose this way of virtualizing > itself to let multiple APPs share one physical device. For example, Intel > Scalable IOV technology is known to use vfio-mdev as SW framework for Scalable > IOV enabled devices (and Intel net/crypto/raw devices support this tech). For > those mdev-based devices, DPDK needs support on the bus layer to scan/plug/probe/.. > them, which is the main effort this patchset does. There are also other devices > using the vfio-mdev framework, AFAIK, Nvidia's GPU is the first one using mdev > and Intel's GPU virtualization also uses it. Yes mdev was designed for virtualization I think. The use of mdev for Scalable IOV without virtualization may be seen as an abuse by Linux maintainers, as they currently seem to prefer the auxiliary bus (which is a real bus). Mellanox got a push back when trying to use mdev for the same purpose (Scalable Function, also called Sub-Function) in the kernel. The Linux community decided to use the auxiliary bus. Any other feedback on the choice mdev vs aux? Is there any kernel code supporting this mdev model for Intel devices? > > > In this patchset, the PCI bus driver is extended to support scanning > > > and probing the mdev devices whose device-api is "vfio-pci". > > > > > > +---------+ > > > | PCI bus | > > > +----+----+ > > > | > > > +--------+-------+-------+--------+ > > > | | | | > > > Physical PCI devices ... Mediated PCI devices ... > > > > > > The first four patches in this patchset are mainly preparation of mdev > > > bus support. The left two patches are the key implementation of mdev bus. > > > > > > The implementation of mdev bus in DPDK has several options: > > > > > > 1: Embed mdev bus in current pci bus > > > > > > This patchset takes this option for an example. Mdev has several > > > device types: pci/platform/amba/ccw/ap. DPDK currently only cares > > > pci devices in all mdev device types so we could embed the mdev bus > > > into current pci bus. Then pci bus with mdev support will scan/plug/ > > > unplug/.. not only normal pci devices but also mediated pci devices. > > > > I think it is a different bus. > > It would be cleaner to not touch the PCI bus. > > Having a separate bus will allow an easy way to identify a device > > with the new generic devargs syntax, example: > > bus=mdev,uuid=XXX > > or more complex: > > bus=mdev,uuid=XXX/class=crypto/driver=qat,foo=bar > > OK. Agree on cleaner to not touch PCI bus. And there may also be a 'type=pci' > as mdev has several types in its definition (pci/ap/platform/ccw/...). > > > > 2: A new mdev bus that scans mediated pci devices and probes mdev driver to > > > plug-in pci devices to pci bus > > > > > > If we took this option, a new mdev bus will be implemented to scan > > > mediated pci devices and a new mdev driver for pci devices will be > > > implemented in pci bus to plug-in mediated pci devices to pci bus. > > > > > > Our RFC v1 takes this option: > > > http://patchwork.dpdk.org/project/dpdk/cover/20190403071844.21126-1- > > tiwei.bie@intel.com/ > > > > > > Note that: for either option 1 or 2, device drivers do not know the > > > implementation difference but only use structs/functions exposed by > > > pci bus. Mediated pci devices are different from normal pci devices > > > on: 1. Mediated pci devices use UUID as address but normal ones use BDF. > > > 2. Mediated pci devices may have some capabilities that normal pci > > > devices do not have. For example, mediated pci devices could have > > > regions that have sparse mmap capability, which allows a region to have > > > multiple mmap areas. Another example is mediated pci devices may have > > > regions/part of regions not mmaped but need to access them. Above > > > difference will change the current ABI (i.e., struct rte_pci_device). > > > Please check 5th and 6th patch for details. > > > > > > 3. A brand new mdev bus that does everything > > > > > > This option will implement a new and standalone mdev bus. This option > > > does not need any changes in current pci bus but only needs some shared > > > code (linux vfio part) in pci bus. Drivers of devices that support mdev > > > will register itself as a mdev driver and do not rely on pci bus anymore. > > > This option, IMHO, will make the code clean. The only potential problem > > > may be code duplication, which could be solved by making code of linux > > > vfio part of pci bus common and shared. > > > > Yes I prefer this third option. > > We can find an elegant way of sharing some VFIO code between buses. > > Yes, I have not thought about the details of the code sharing but will try to make > it elegant. Great, thanks.