DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Stojaczyk, Dariusz" <dariusz.stojaczyk@intel.com>
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>,
	Santosh Shukla <santosh.shukla@caviumnetworks.com>,
	"Hemant Agrawal" <hemant.agrawal@nxp.com>,
	Jerin Jacob <jerin.jacob@caviumnetworks.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>,
	Chas Williams <chas3@att.com>
Subject: Re: [dpdk-dev] [PATCH v2] eal/bus: use RTE_IOVA_PA only if phys addresses are available
Date: Mon, 17 Sep 2018 13:06:21 +0000	[thread overview]
Message-ID: <FBE7E039FA50BF47A673AD0BD3CD56A8461E8117@HASMSX105.ger.corp.intel.com> (raw)
In-Reply-To: <f59b3806-9416-d178-49cc-424edea66a63@intel.com>



> -----Original Message-----
> From: Burakov, Anatoly
> Sent: Monday, September 17, 2018 12:34 PM
> To: Stojaczyk, Dariusz <dariusz.stojaczyk@intel.com>; dev@dpdk.org;
> Santosh Shukla <santosh.shukla@caviumnetworks.com>; Hemant Agrawal
> <hemant.agrawal@nxp.com>; Jerin Jacob
> <jerin.jacob@caviumnetworks.com>
> Cc: Maxime Coquelin <maxime.coquelin@redhat.com>; Chas Williams
> <chas3@att.com>
> Subject: Re: [PATCH v2] eal/bus: use RTE_IOVA_PA only if phys addresses
> are available
> 
> On 07-Sep-18 4:58 PM, Darek Stojaczyk wrote:
> > When neither RTE_IOVA_VA nor RTE_IOVA_PA was explicitly requested,
> > DPDK would currently fallback to the default RTE_IOVA_PA mode and
> > possibly encounter a failure later on if running as a non-priviledged
> > user. Attempting to use RTE_IOVA_VA if no phys addresses are available
> > may help in this case.
> >
> > Signed-off-by: Darek Stojaczyk <dariusz.stojaczyk@intel.com>
> > ---
> > Changes since v1:
> >   * added a missing rte_memory.h include
> >
> >   lib/librte_eal/common/eal_common_bus.c | 19 +++++++++++++++----
> >   1 file changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/lib/librte_eal/common/eal_common_bus.c
> > b/lib/librte_eal/common/eal_common_bus.c
> > index 0943851cc..68c581b8a 100644
> > --- a/lib/librte_eal/common/eal_common_bus.c
> > +++ b/lib/librte_eal/common/eal_common_bus.c
> > @@ -37,6 +37,7 @@
> >   #include <rte_bus.h>
> >   #include <rte_debug.h>
> >   #include <rte_string_fns.h>
> > +#include <rte_memory.h>
> >
> >   #include "eal_private.h"
> >
> > @@ -236,9 +237,19 @@ rte_bus_get_iommu_class(void)
> >   			mode |= bus->get_iommu_class();
> >   	}
> >
> > -	if (mode != RTE_IOVA_VA) {
> > -		/* Use default IOVA mode */
> > -		mode = RTE_IOVA_PA;
> > +	if (mode == RTE_IOVA_VA)
> > +		return RTE_IOVA_VA;
> > +
> > +	if (mode & RTE_IOVA_PA) {
> > +		/* Not all buses support RTE_IOVA_VA, fallback to
> RTE_IOVA_PA */
> > +		return RTE_IOVA_PA;
> > +	}
> > +
> > +	if (rte_eal_using_phys_addrs()) {
> > +		/* Default to RTE_IOVA_PA only if it's supported */
> > +		return RTE_IOVA_PA;
> >   	}
> > -	return mode;
> > +
> > +	/* Since RTE_IOVA_PA is unsupported, fallback to RTE_IOVA_VA */
> > +	return RTE_IOVA_VA;
> >   }
> >
> 
> This is a good change, however I think that this is too pessimistic. If i don't
> have any devices that explictly require IOVA_PA, i should be running in
> IOVA_VA mode.

Another problem may occur when trying to hotplug devices that support only 39bit DMA. You may not be able to map any memory with vfio when in RTE_IOVA_VA mode, as virtual addresses likely occupy more than 39 bits. 

The rte_pci bus enforces RTE_IOVA_PA whenever it finds such devices on init.

I have no doubt the logic can be improved here, but for now RTE_IOVA_PA is the only safe default.

D.

> 
> This of course doesn't take hotplug into account, so a command-line switch
> to force one or the other should also be available.
> 
> For example, at startup, i might have devices bound to VFIO, so IOVA_VA
> mode is picked. However, even though at a time of startup none of the
> devices require physical addresses, i also know that i might later hotplug a
> device that requires IOVA_PA (leaving the question of hotplug brokenness
> aside for now...) - currently, this scenario will not work, as i will be forced to
> use IOVA_VA mode unless i happen to have a IOVA_PA device available at
> startup.
> 
> Similarly, if i'm running DPDK as root but am only using virtual devices like
> pcap, i should be able to force DPDK into using VA addresses [*], yet
> currently i will be forced to use IOVA_PA if i don't *also* have a few devices
> bound exclusively to VFIO.
> 
> [*] Do we have vdev devices that require IOVA_PA? I can't think of any...
> 
> --
> Thanks,
> Anatoly

  reply	other threads:[~2018-09-17 13:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07 15:47 [dpdk-dev] [PATCH] " Darek Stojaczyk
2018-09-07 15:58 ` [dpdk-dev] [PATCH v2] " Darek Stojaczyk
2018-09-17 10:33   ` Burakov, Anatoly
2018-09-17 13:06     ` Stojaczyk, Dariusz [this message]
2018-10-30 12:58       ` Alejandro Lucero
2018-10-28 23:11     ` Thomas Monjalon
2018-10-30 10:25       ` Burakov, Anatoly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FBE7E039FA50BF47A673AD0BD3CD56A8461E8117@HASMSX105.ger.corp.intel.com \
    --to=dariusz.stojaczyk@intel.com \
    --cc=anatoly.burakov@intel.com \
    --cc=chas3@att.com \
    --cc=dev@dpdk.org \
    --cc=hemant.agrawal@nxp.com \
    --cc=jerin.jacob@caviumnetworks.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=santosh.shukla@caviumnetworks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).