DPDK patches and discussions
 help / color / Atom feed
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
To: Thomas Monjalon <thomas@monjalon.net>
Cc: David Marchand <david.marchand@redhat.com>,
	dev@dpdk.org, jerinj@marvell.com,
	John McNamara <john.mcnamara@intel.com>,
	Marko Kovacevic <marko.kovacevic@intel.com>,
	Igor Russkikh <igor.russkikh@aquantia.com>,
	Pavel Belous <pavel.belous@aquantia.com>,
	Ajit Khaparde <ajit.khaparde@broadcom.com>,
	Somnath Kotur <somnath.kotur@broadcom.com>,
	Wenzhuo Lu <wenzhuo.lu@intel.com>,
	John Daley <johndale@cisco.com>,
	Hyong Youb Kim <hyonkim@cisco.com>,
	Qi Zhang <qi.z.zhang@intel.com>,
	Xiao Wang <xiao.w.wang@intel.com>,
	Beilei Xing <beilei.xing@intel.com>,
	Jingjing Wu <jingjing.wu@intel.com>,
	Qiming Yang <qiming.yang@intel.com>,
	Konstantin Ananyev <konstantin.ananyev@intel.com>,
	Matan Azrad <matan@mellanox.com>,
	Shahaf Shuler <shahafs@mellanox.com>,
	Yongseok Koh <yskoh@mellanox.com>,
	Viacheslav Ovsiienko <viacheslavo@mellanox.com>,
	Alejandro Lucero <alejandro.lucero@netronome.com>,
	Nithin Dabilpuram <ndabilpuram@marvell.com>,
	Kiran Kumar K <kirankumark@marvell.com>,
	Rasesh Mody <rmody@marvell.com>,
	Shahed Shaikh <shshaikh@marvell.com>,
	Bruce Richardson <bruce.richardson@intel.com>,
	alialnu@mellanox.com, aconole@redhat.com
Subject: Re: [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers
Date: Fri, 12 Jul 2019 13:58:46 +0100
Message-ID: <4998e025-ed05-3a56-1e4c-c053cf67a7c4@intel.com> (raw)
In-Reply-To: <2927698.45TxNz31xh@xps>

On 12-Jul-19 1:43 PM, Thomas Monjalon wrote:
> 12/07/2019 13:03, Burakov, Anatoly:
>> On 10-Jul-19 10:48 PM, David Marchand wrote:
>>> The incriminated commit broke the use of RTE_PCI_DRV_IOVA_AS_VA which
>>> was intended to mean "driver only supports VA" but had been understood
>>> as "driver supports both PA and VA" by most net drivers and used to let
>>> dpdk processes to run as non root (which do not have access to physical
>>> addresses on recent kernels).
>>>
>>> The check on physical addresses actually closed the gap for those
>>> drivers. We don't need to mark them with RTE_PCI_DRV_IOVA_AS_VA and this
>>> flag can retain its intended meaning.
>>> Document explicitly its meaning.
>>>
>>
>> So, we always assume that all devices support both IOVA as PA and IOVA
>> as VA by default. Well, as long as it's understood and documented :)
> 
> Yes
> Please make sure it is well documented.
> 
>> Unless...
>>
>>
>> <snip>
>>
>>> +
>>> +IOVA Mode is selected by considering what the current usable Devices on the
>>> +system requires and/or supports.
>>> +
>>> +Below is the 2-step heuristic for this choice.
>>> +
>>> +For the first step, EAL asks each bus its requirement in terms of IOVA mode
>>> +and decides on a preferred IOVA mode.
>>> +
>>> +- if all buses report RTE_IOVA_PA, then the preferred IOVA mode is RTE_IOVA_PA,
>>> +- if all buses report RTE_IOVA_VA, then the preferred IOVA mode is RTE_IOVA_VA,
>>> +- if all buses report RTE_IOVA_DC, no bus expressed a preferrence, then the
>>> +  preferred mode is RTE_IOVA_DC,
>>> +- if the buses disagree (at least one wants RTE_IOVA_PA and at least one wants
>>> +  RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the
>>> +  check on Physical Addresses availability),
>>> +
>>> +The second step is checking if the preferred mode complies with the Physical
>>> +Addresses availability since those are only available to root user in recent
>>> +kernels.
>>> +
>>> +- if the preferred mode is RTE_IOVA_PA but there is no access to Physical
>>> +  Addresses, then EAL init will fail early, since later probing of the devices
>>> +  would fail anyway,
>>> +- if the preferred mode is RTE_IOVA_DC then based on the Physical Addresses
>>> +  availability, the preferred mode is adjusted to RTE_IOVA_PA or RTE_IOVA_VA.
>>> +  In the case when the buses had disagreed on the IOVA Mode at the first step,
>>> +  part of the buses won't work because of this decision.
>>
>> Is there any specific reason why we always prefer PA if physical
>> addresses are available? Since we're already assuming that all devices
>> support PA and VA anyway, what's the harm in enabling VA by default?
> 
> If PA is available, it means we are running as root.
> We can assume that using root is a choice, probably related
> to a preference for PA.
> 
>> I seem to recall there were some concerns around SPDK and PA address
>> availability - doesn't that mean that the assumption regarding PA and VA
>> mode always being supported doesn't actually hold in practice?
>>
>> By the way, the reason i'm harping away on IOVA as VA being the default
>> is because having IOVA as PA is not a free (as in beer) choice - we
>> sacrifice some usability by doing that. Right now, by default, mempool
>> will ask for IOVA-contiguous memory first, and this is slow in IOVA as
>> PA mode - meaning, e.g. testpmd startup time is greatly increased for
>> smaller page sizes because of IOVA as PA mode is the default in DPDK.
>>
>> I would also like to steer people away from using real physical
>> addresses because doing so while requiring lots of IOVA contiguous
>> memory also requires legacy mem mode, which i would rather people not
>> use and grow dependent on, and would like to remove it at some point as
>> it adds a lot of complexity for a corner case.
> 
> That's why we should better encourage to not run as root.
> We need more documentation about how to run as normal user.
> 
>> So, picking address mode is not *just* about whether the device supports
>> them - it has usability implications as well.
> 
> If we consider running as root an exception, then it makes
> sense to pick address mode which fits this exception (PA).
> 

When you put it that way, that does indeed make sense. Typically though, 
developers tend to run as root. I shall hereby stop doing so :)

-- 
Thanks,
Anatoly

  reply index

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-10 21:48 [dpdk-dev] [PATCH 0/2] Fixes on IOVA mode selection David Marchand
2019-07-10 21:48 ` [dpdk-dev] [PATCH 1/2] Revert "bus/pci: add Mellanox kernel driver type" David Marchand
2019-07-16 10:37   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-10 21:48 ` [dpdk-dev] [PATCH 2/2] eal: fix IOVA mode selection as VA for pci drivers David Marchand
2019-07-11 14:40   ` Thomas Monjalon
2019-07-12  8:05     ` Jerin Jacob Kollanukkaran
2019-07-12 11:03   ` Burakov, Anatoly
2019-07-12 12:43     ` Thomas Monjalon
2019-07-12 12:58       ` Burakov, Anatoly [this message]
2019-07-12 13:19         ` Bruce Richardson
2019-07-15 14:26       ` Jerin Jacob Kollanukkaran
2019-07-15 15:03         ` Thomas Monjalon
2019-07-15 15:35           ` Jerin Jacob Kollanukkaran
2019-07-15 16:06             ` Thomas Monjalon
2019-07-15 16:27               ` Jerin Jacob Kollanukkaran
2019-07-16 13:46 ` [dpdk-dev] [PATCH v2 0/4] Fixes on IOVA mode selection jerinj
2019-07-16 13:46   ` [dpdk-dev] [PATCH v2 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj
2019-07-16 13:46   ` [dpdk-dev] [PATCH v2 2/4] eal: fix IOVA mode selection as VA for pci drivers jerinj
2019-07-16 14:26     ` Burakov, Anatoly
2019-07-16 15:07       ` Jerin Jacob Kollanukkaran
2019-07-16 13:46   ` [dpdk-dev] [PATCH v2 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name jerinj
2019-07-16 13:46   ` [dpdk-dev] [PATCH v2 4/4] eal: select IOVA mode as VA for default case jerinj
2019-07-16 14:33     ` Burakov, Anatoly
2019-07-17  8:33       ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-17 12:38         ` Burakov, Anatoly
2019-07-17 14:04           ` Jerin Jacob Kollanukkaran
2019-07-18  6:45   ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection jerinj
2019-07-18  6:45     ` [dpdk-dev] [PATCH v3 1/4] Revert "bus/pci: add Mellanox kernel driver type" jerinj
2019-07-18  6:45     ` [dpdk-dev] [PATCH v3 2/4] eal: fix IOVA mode selection as VA for pci drivers jerinj
2019-07-18  6:45     ` [dpdk-dev] [PATCH v3 3/4] eal: change RTE_PCI_DRV_IOVA_AS_VA flag name jerinj
2019-07-18  6:45     ` [dpdk-dev] [PATCH v3 4/4] eal: select IOVA mode as VA for default case jerinj
2019-07-22 11:28     ` [dpdk-dev] [PATCH v3 0/4] Fixes on IOVA mode selection David Marchand
2019-07-22 12:56 ` [dpdk-dev] [PATCH v4 " David Marchand
2019-07-22 12:56   ` [dpdk-dev] [PATCH v4 1/4] Revert "bus/pci: add Mellanox kernel driver type" David Marchand
2019-07-22 12:56   ` [dpdk-dev] [PATCH v4 2/4] eal: fix IOVA mode selection as VA for PCI drivers David Marchand
2019-11-25  9:33     ` Ferruh Yigit
2019-11-25 10:22       ` Thomas Monjalon
2019-11-25 12:03         ` Ferruh Yigit
2019-11-25 12:36           ` David Marchand
2019-11-25 12:58             ` Burakov, Anatoly
2019-11-25 14:29               ` Thomas Monjalon
2019-11-25 11:07       ` Jerin Jacob
2019-07-22 12:56   ` [dpdk-dev] [PATCH v4 3/4] drivers: change IOVA as VA PCI flag name David Marchand
2019-07-22 12:56   ` [dpdk-dev] [PATCH v4 4/4] eal: select IOVA as VA mode for default case David Marchand
2019-07-22 15:53   ` [dpdk-dev] [PATCH v4 0/4] Fixes on IOVA mode selection Thomas Monjalon
2019-07-23  3:35     ` Stojaczyk, Dariusz
2019-07-23  4:18       ` Jerin Jacob Kollanukkaran
2019-07-23  4:54         ` Stojaczyk, Dariusz
2019-07-23  5:27           ` Jerin Jacob Kollanukkaran
2019-07-23  7:21             ` Thomas Monjalon
2019-07-23  9:57             ` Burakov, Anatoly
2019-07-23 10:25               ` Thomas Monjalon
2019-07-23 13:56                 ` Burakov, Anatoly
2019-07-23 14:24                   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-23 14:29                   ` [dpdk-dev] " Burakov, Anatoly
2019-07-23 14:36                     ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-07-23 15:47                       ` Burakov, Anatoly

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4998e025-ed05-3a56-1e4c-c053cf67a7c4@intel.com \
    --to=anatoly.burakov@intel.com \
    --cc=aconole@redhat.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=alejandro.lucero@netronome.com \
    --cc=alialnu@mellanox.com \
    --cc=beilei.xing@intel.com \
    --cc=bruce.richardson@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=hyonkim@cisco.com \
    --cc=igor.russkikh@aquantia.com \
    --cc=jerinj@marvell.com \
    --cc=jingjing.wu@intel.com \
    --cc=john.mcnamara@intel.com \
    --cc=johndale@cisco.com \
    --cc=kirankumark@marvell.com \
    --cc=konstantin.ananyev@intel.com \
    --cc=marko.kovacevic@intel.com \
    --cc=matan@mellanox.com \
    --cc=ndabilpuram@marvell.com \
    --cc=pavel.belous@aquantia.com \
    --cc=qi.z.zhang@intel.com \
    --cc=qiming.yang@intel.com \
    --cc=rmody@marvell.com \
    --cc=shahafs@mellanox.com \
    --cc=shshaikh@marvell.com \
    --cc=somnath.kotur@broadcom.com \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@mellanox.com \
    --cc=wenzhuo.lu@intel.com \
    --cc=xiao.w.wang@intel.com \
    --cc=yskoh@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

Archives are clonable:
	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev


Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox