From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 07CF9A00E6 for ; Fri, 12 Jul 2019 14:58:57 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id A377D1BDF0; Fri, 12 Jul 2019 14:58:56 +0200 (CEST) Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 700961BDEC for ; Fri, 12 Jul 2019 14:58:55 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 12 Jul 2019 05:58:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,482,1557212400"; d="scan'208";a="250117112" Received: from aburakov-mobl1.ger.corp.intel.com (HELO [10.237.220.82]) ([10.237.220.82]) by orsmga001.jf.intel.com with ESMTP; 12 Jul 2019 05:58:47 -0700 To: Thomas Monjalon Cc: David Marchand , dev@dpdk.org, jerinj@marvell.com, John McNamara , Marko Kovacevic , Igor Russkikh , Pavel Belous , Ajit Khaparde , Somnath Kotur , Wenzhuo Lu , John Daley , Hyong Youb Kim , Qi Zhang , Xiao Wang , Beilei Xing , Jingjing Wu , Qiming Yang , Konstantin Ananyev , Matan Azrad , Shahaf Shuler , Yongseok Koh , Viacheslav Ovsiienko , Alejandro Lucero , Nithin Dabilpuram , Kiran Kumar K , Rasesh Mody , Shahed Shaikh , Bruce Richardson , alialnu@mellanox.com, aconole@redhat.com References: <1562795329-16652-1-git-send-email-david.marchand@redhat.com> <1562795329-16652-3-git-send-email-david.marchand@redhat.com> <615dc23b-639a-01a0-4871-4c168e64bbb9@intel.com> <2927698.45TxNz31xh@xps> From: "Burakov, Anatoly" Message-ID: <4998e025-ed05-3a56-1e4c-c053cf67a7c4@intel.com> Date: Fri, 12 Jul 2019 13:58:46 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <2927698.45TxNz31xh@xps> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 12-Jul-19 1:43 PM, Thomas Monjalon wrote: > 12/07/2019 13:03, Burakov, Anatoly: >> On 10-Jul-19 10:48 PM, David Marchand wrote: >>> The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which >>> was intended to mean "driver only supports VA" but had been understood >>> as "driver supports both PA and VA" by most net drivers and used to let >>> dpdk processes to run as non root (which do not have access to physical >>> addresses on recent kernels). >>> >>> The check on physical addresses actually closed the gap for those >>> drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this >>> flag can retain its intended meaning. >>> Document explicitly its meaning. >>> >> >> So, we always assume that all devices support both IOVA as PA and IOVA >> as VA by default. Well, as long as it's understood and documented :) > > Yes > Please make sure it is well documented. > >> Unless... >> >> >> >> >>> + >>> +IOVA Mode is selected by considering what the current usable Devices on the >>> +system requires and/or supports. >>> + >>> +Below is the 2-step heuristic for this choice. >>> + >>> +For the first step, EAL asks each bus its requirement in terms of IOVA mode >>> +and decides on a preferred IOVA mode. >>> + >>> +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA, >>> +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA, >>> +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the >>> + preferred mode is RTE_IOVA_DC, >>> +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants >>> + RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the >>> + check on Physical Addresses availability), >>> + >>> +The second step is checking if the preferred mode complies with the Physical >>> +Addresses availability since those are only available to root user in recent >>> +kernels. >>> + >>> +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical >>> + Addresses, then EAL init will fail early, since later probing of the devices >>> + would fail anyway, >>> +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses >>> + availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA. >>> + In the case when the buses had disagreed on the IOVA Mode at the first step, >>> + part of the buses won't work because of this decision. >> >> Is there any specific reason why we always prefer PA if physical >> addresses are available? Since we're already assuming that all devices >> support PA and VA anyway, what's the harm in enabling VA by default? > > If PA is available, it means we are running as root. > We can assume that using root is a choice, probably related > to a preference for PA. > >> I seem to recall there were some concerns around SPDK and PA address >> availability - doesn't that mean that the assumption regarding PA and VA >> mode always being supported doesn't actually hold in practice? >> >> By the way, the reason i'm harping away on IOVA as VA being the default >> is because having IOVA as PA is not a free (as in beer) choice - we >> sacrifice some usability by doing that. Right now, by default, mempool >> will ask for IOVA-contiguous memory first, and this is slow in IOVA as >> PA mode - meaning, e.g. testpmd startup time is greatly increased for >> smaller page sizes because of IOVA as PA mode is the default in DPDK. >> >> I would also like to steer people away from using real physical >> addresses because doing so while requiring lots of IOVA contiguous >> memory also requires legacy mem mode, which i would rather people not >> use and grow dependent on, and would like to remove it at some point as >> it adds a lot of complexity for a corner case. > > That's why we should better encourage to not run as root. > We need more documentation about how to run as normal user. > >> So, picking address mode is not *just* about whether the device supports >> them - it has usability implications as well. > > If we consider running as root an exception, then it makes > sense to pick address mode which fits this exception (PA). > When you put it that way, that does indeed make sense. Typically though, developers tend to run as root. I shall hereby stop doing so :) -- Thanks, Anatoly