From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id C99CBA00E6 for ; Fri, 14 Jun 2019 10:42:36 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A3B501D4ED; Fri, 14 Jun 2019 10:42:35 +0200 (CEST) Received: from mail-vk1-f196.google.com (mail-vk1-f196.google.com [209.85.221.196]) by dpdk.org (Postfix) with ESMTP id 610B11D4E8 for ; Fri, 14 Jun 2019 10:42:34 +0200 (CEST) Received: by mail-vk1-f196.google.com with SMTP id w186so355840vkd.11 for ; Fri, 14 Jun 2019 01:42:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=esmXh8mKcwNK80hMYmnkBy5EHCqAqO0IFxI7f0/g1lU=; b=Om086T7GkRp4DrpNDo4wixrUpYy3+GNIu0O2Wm5/twskES1fX5gUsPmATYZ1y6p/6U JVO//4wvRhVnw9F2HwLqCWSuT5Av38sBHsUM/5PDBhOD7MrB11+19TJHV2VEfavdrgVE FVw19YHTQLajF4taGPR/cNQ0VYHn4WFFUhukxZugyfaxAX6YuHOCidrGa+PLOg9ysXf9 /oUrSNh50zlBKYCh07Qfp39NEenjmNZxgP5FzMnO9ByXkSM5+W332JwIHw3jJ5ZjZ8x2 eEwBMkWbh0im7NeoDsjNYnnEMln2HA86xP8Q8IDSixirkdmU97+xXwWJrKR8i2LGdHB0 N86Q== X-Gm-Message-State: APjAAAU1HliMgAJB/gXvgonrUxzUwvUpyCrLfdDy0rbTGxtHl5ATVazA f1nBGOyO/mISEiI9ZC114ciN+NUL8ZdlhtFLbqsXQQ== X-Google-Smtp-Source: APXvYqzCkjjQktmF9ooDRyM/9zb3MIZca8x0EX5khfqzp9/kmYIL7q48Hj0G7KpH/UFpUyEa3bNbk1jUoECIgd6FBig= X-Received: by 2002:a1f:8d0b:: with SMTP id p11mr22062232vkd.31.1560501753718; Fri, 14 Jun 2019 01:42:33 -0700 (PDT) MIME-Version: 1.0 References: <20190530174819.1160221-1-benjamin.walker@intel.com> In-Reply-To: From: David Marchand Date: Fri, 14 Jun 2019 10:42:22 +0200 Message-ID: To: "Walker, Benjamin" Cc: "jerinj@marvell.com" , "Burakov, Anatoly" , "dev@dpdk.org" , Maxime Coquelin , hemant.agrawal@nxp.com, shreyansh.jain@nxp.com, Rosen Xu , Gaetan Rivet , Stephen Hemminger , "Yigit, Ferruh" , Thomas Monjalon , Yongseok Koh Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] eal/pci: Improve automatic selection of IOVA mode X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, Jun 3, 2019 at 6:44 PM Walker, Benjamin wrote: > On Mon, 2019-06-03 at 12:48 +0200, David Marchand wrote: > > Hello, > > > > On Thu, May 30, 2019 at 7:48 PM Ben Walker > wrote: > > > In SPDK, not all drivers are registered with DPDK at start up time. > > > Previously, that meant DPDK always chose to set itself up in IOVA_PA > > > mode. Instead, when the correct iova choice is unclear based on the > > > devices and drivers known to DPDK at start up time, use other > heuristics > > > (such as whether /proc/self/pagemap is accessible) to make a better > > > choice. > > > > > > This enables SPDK to run as an unprivileged user again without > requiring > > > users to explicitly set the iova mode on the command line. > > > > > > > Interesting, I got a bz on something similar the day you sent this > patchset ;- > > ) > > > > > > - When a dpdk process is started, either it has access to physical > addresses > > or not, and this won't change for the rest of its life. > > Your fix on defaulting to VA based on a rte_eal_using_phys_addrs() check > makes > > sense to me. > > It is the most encountered situation when running ovs as non root on > recent > > kernels. > > > > > > - However, I fail to see the need for all of this detection code wrt > drivers > > and devices. > > > > On one side of the equation, when dpdk starts, it checks physical address > > availability. > > On the other side of the equation, we have the drivers that will be > invoked > > when probing devices (either at dpdk init, or when hotplugging a device). > > > > At this point, the probing call should check the driver requirement wrt > to the > > kernel driver the device is attached to. > > If this requirement is not fulfilled, then the probing fails. > > > > > > - This leaves the --iova-va forcing option. > > Why do we need it? > > If we don't have access to physical addresses, no choice but run in VA > mode. > > If we have access to physical addresses, the only case would be that you > want > > to downgrade from PA to VA. > > But well, your process can still access it, not sure what the benefit is. > > All of the complexity here, at least as far as I understand it, stems from > supporting hot insert of devices. This is very important to SPDK because > storage > devices get hot inserted all the time, so we very much appreciate that > DPDK has > put in so much effort in this area and continues to accept our patches to > improve it. I know hot insert is not nearly as important for network > devices. > > When DPDK starts up, it needs to select whether to use virtual addresses or > physical addresses in its memory maps. It can do that by answering the > following > questions: > > 1. Does the system only have buses that support an IOMMU? > 2. Is the IOMMU sufficiently fast for the use case? > 3. Will all of the devices that will be used with DPDK throughout the > application's lifetime work with an IOMMU? > > If these three things are true, then the best choice is to use virtual > addresses > in the memory translations. However, if any of the above are not true it > needs > to fall back to physical addresses. > > #1 is checked by simply asking all of the buses, which are known up front. > #2 is > just assumed to be true. But #3 is not possible to check fully because of > hot > insert. > > The code currently approximates the #3 check by looking at the devices > present > at initialization time. If a device exists that's bound to vfio-pci, and no > other devices exist that are bound to a uio driver, and DPDK has a > registered > driver that's actually going to load against the vfio-pci devices, then it > will > elect to use virtual addresses. This is purely a heuristic - it's not a > definitive answer because the user could later hot insert a device that > gets > bound to uio. > > The user, of course, knows the answer to which addressing scheme to use > typically. For example, these checks assume #2 is true, but there may be > hardware implementations where it is not and the user wants to force > physical > addresses. Or the user may know that they are going to hot insert a device > at > run time that doesn't work with the IOMMU. That's why it's important to > maintain > the ability for the user to override the default heuristic's decision via > the > command line. > > My patch series is simply improving the heuristic in a few ways. First, > previously each bus when queried would return either virtual or physical > addresses as its choice. However, often the bus just does not have enough > information to formulate any preference at all (and PCI was defaulting to > physical addresses in this case). Instead, I made it so that the bus can > return > that it doesn't care, which pushes the decision up to a higher level. That > higher level then makes the decision by checking whether it can access > /proc/self/pagemap. Second, I narrowed the uio check such that physical > addresses will only be selected if a device bound to uio exists and there > is a > driver registered to use it. Previously if any device was bound to uio it > would > select physical addresses, even if DPDK never ended up loading against that > device. > > I think these two things make the heuristic choose the right thing more > often, > but it still won't always get it right so the command line option needs to > remain. > > After some exchanges offlist, on irc and taking some time looking at the code, here are my conclusions. Copying bus drivers maintainers/connaisseurs. We have cases where we prefer using VA even if PA are available (for fslmc where translating from iova as PA to VA is more costly). I worked on Ben patches and summarised it as two main issues with the current code: - physical addresses availability is not taken into account early enough in EAL init, and we end up with memory subsystem complaining later which is not that user friendly. A collateral is that the init could have fallen back to using VA in most cases if there were no strong requirement on PA. - pci bus driver looks at all devices on the system, with no consideration on the pci white/blacklist and no consideration on the fact that dpdk has a driver that supports the device I prepared a new series that I will send shortly. I am currently considering the backport potential for it. Thoughts? Else, reviews are welcome. Thanks. -- David Marchand