From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 931E4A0487 for ; Tue, 30 Jul 2019 09:22:03 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F0DC41BF6C; Tue, 30 Jul 2019 09:22:02 +0200 (CEST) Received: from mail-vs1-f66.google.com (mail-vs1-f66.google.com [209.85.217.66]) by dpdk.org (Postfix) with ESMTP id 88D051BF53 for ; Tue, 30 Jul 2019 09:22:01 +0200 (CEST) Received: by mail-vs1-f66.google.com with SMTP id y16so42774020vsc.3 for ; Tue, 30 Jul 2019 00:22:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=r/YJQk/yjxsXueVsf15as9SC0glTEu4L0kfo85X/lnc=; b=ZVhGaEpsVMuBayzfKXAbmRc4y3XwZGMmb9VtgMbhZk8uKvG0G4naWdiAuLdCATD2zC TmIvzJhEYC0Hw+otsSv7GpV+o5TBJSuyJZA1JXv7+MA+HdHGc1N/RmC4nlaCkH1xGcuR UXikwhk22C8cAj24tWnCTKYa/KauxkeCRrP6wcfrxbarVQirj2VWlDbu/DGJ5gFATEdD dsatOJZ45DHVuWmEKJbHzVHV9RFxHeimNYNezvhK/OwWh1n7YglJINTn5tP5dBsX9iKw HG4NNKB0coahk9X/z46eYpkW9tivVSX63yCJKpPZdiuug3woFj9dzYQwlPMCcGWBzbRt FvqA== X-Gm-Message-State: APjAAAWSzPkS26GLV5bFLUVlGNffBBdl2ISiazMdy8GKYr7taWoNPJ9E 05N+SzWKw9XYh4RGlx+7Yc8a7UMFxjUaZcrsx2UtHw== X-Google-Smtp-Source: APXvYqxJ3puue84DjX3hujp0KoPvuS3Y4NHilEfMS8nS+3doBm0Yll++tDsB+j4WRxXONJ84/hgRSGKYV8+NrQ2KsKk= X-Received: by 2002:a67:f998:: with SMTP id b24mr22348641vsq.180.1564471320851; Tue, 30 Jul 2019 00:22:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: David Marchand Date: Tue, 30 Jul 2019 09:21:49 +0200 Message-ID: To: Anatoly Burakov Cc: dev , John McNamara , Marko Kovacevic , dariusz.stojaczyk@intel.com, Thomas Monjalon , Jerin Jacob Kollanukkaran Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH v5] eal: pick IOVA as PA if IOMMU is not available X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, Jul 29, 2019 at 5:03 PM Anatoly Burakov wrote: > > When IOMMU is not available, /sys/kernel/iommu_groups will not be > populated. This is happening since at least 3.6 when VFIO support > was added. If the directory is empty, EAL should not pick IOVA as > VA as the default IOVA mode. > > Signed-off-by: Anatoly Burakov > Tested-by: Darek Stojaczyk > Tested-by: Jerin Jacob > Reviewed-by: Jerin Jacob > --- > > Notes: > v5: > - Clarify docs on FreeBSD > - Move IOMMU detection code out of VFIO sources > > v4: > - Fix indentation in release notes' known issues > > v3: > - Add documentation changes > - Fix a typo pointed out by checkpatch > > v2: > - Decouple IOMMU from VFIO > - Add a check for physical addresses availability > > .../prog_guide/env_abstraction_layer.rst | 27 ++++++---- > doc/guides/rel_notes/known_issues.rst | 26 ++++++++++ > doc/guides/rel_notes/release_19_08.rst | 16 ++++++ > lib/librte_eal/linux/eal/eal.c | 50 ++++++++++++++++++- > 4 files changed, 107 insertions(+), 12 deletions(-) > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst > index 1487ea550..94f30fd5d 100644 > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > @@ -425,7 +425,8 @@ IOVA Mode Detection > IOVA Mode is selected by considering what the current usable Devices on the > system require and/or support. > > -Below is the 2-step heuristic for this choice. > +On FreeBSD, RTE_IOVA_PA is always the default. On Linux, the IOVA mode is > +detected based on a 2-step heuristic detailed below. > > For the first step, EAL asks each bus its requirement in terms of IOVA mode > and decides on a preferred IOVA mode. > @@ -438,20 +439,26 @@ and decides on a preferred IOVA mode. > RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the > check on Physical Addresses availability), > > +If the buses have expressed no preference on which IOVA mode to pick, then a > +default is selected using the following logic: > + > +- if physical addresses are not available, RTE_IOVA_VA mode is used > +- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used > +- otherwise, RTE_IOVA_PA mode is used > + > +In the case when the buses had disagreed on their preferred IOVA mode, part of > +the buses won't work because of this decision. > + > The second step checks if the preferred mode complies with the Physical > Addresses availability since those are only available to root user in recent > -kernels. > - > -- if the preferred mode is RTE_IOVA_PA but there is no access to Physical > - Addresses, then EAL init fails early, since later probing of the devices > - would fail anyway, > -- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode. > - In the case when the buses had disagreed on the IOVA Mode at the first step, > - part of the buses won't work because of this decision. > +kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to > +Physical Addresses, then EAL init fails early, since later probing of the > +devices would fail anyway. > > .. note:: > > - The RTE_IOVA_VA mode is selected as the default for the following reasons: > + The RTE_IOVA_VA mode is preferred as the default in most cases for the > + following reasons: > > - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of > physical address availability. > diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst > index 276327c15..0b50c8306 100644 > --- a/doc/guides/rel_notes/known_issues.rst > +++ b/doc/guides/rel_notes/known_issues.rst > @@ -861,3 +861,29 @@ AVX-512 support disabled > > **Driver/Module**: > ALL. > + > + > +Unsuitable IOVA mode may be picked as the default > +---------------------------------------------------------------- > +**Description** > + Not all kernel drivers and not all devices support all IOVA modes. EAL will > + attempt to pick a reasonable default based on a number of factors, but there > + may be cases where the default may be unsuitable (for example, hotplugging > + devices using `igb_uio` driver while having picked IOVA as VA mode on EAL > + initialization). > + > +**Implication** > + Some devices (hotplugged or otherwise) may not work due to incompatible IOVA > + mode being automatically picked by EAL. > + > +**Resolution/Workaround**: > + It is possible to force EAL to pick a particular IOVA mode by using the > + `--iova-mode` command-line parameter. If conflicting requirements are present > + (such as one device requiring IOVA as PA and one requiring IOVA as VA mode), > + there is no workaround. > + > +**Affected Environment/Platform**: > + Linux. > + > +**Driver/Module**: > + ALL. > diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst > index c9bd3ce18..b399ca536 100644 > --- a/doc/guides/rel_notes/release_19_08.rst > +++ b/doc/guides/rel_notes/release_19_08.rst > @@ -56,6 +56,12 @@ New Features > Also, make sure to start the actual text at the margin. > ========================================================= > > +* **EAL will now pick IOVA as VA mode as the default in most cases.** > + > + Previously, preferred default IOVA mode was selected to be IOVA as PA. The > + behavior has now been changed to handle IOVA mode detection in a more complex > + manner, and will default to IOVA as VA in most cases. > + > * **Added MCS lock.** > > MCS lock provides scalability by spinning on a CPU/thread local variable > @@ -436,6 +442,16 @@ Known Issues > ========================================================= > > > +* **Unsuitable IOVA mode may be picked as the default** > + > + Not all kernel drivers and not all devices support all IOVA modes. EAL will > + attempt to pick a reasonable default based on a number of factors, but > + there may be cases where the default may be unsuitable. > + > + It is recommended to use the `--iova-mode` command-line parameter if the > + default is not suitable. > + > + > Tested Platforms > ---------------- > > diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c > index 34db78753..6ed602c90 100644 > --- a/lib/librte_eal/linux/eal/eal.c > +++ b/lib/librte_eal/linux/eal/eal.c > @@ -66,6 +66,8 @@ > > #define SOCKET_MEM_STRLEN (RTE_MAX_NUMA_NODES * 10) > > +#define KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups" > + > /* Allow the application to print its usage message too if set */ > static rte_usage_hook_t rte_application_usage_hook = NULL; > > @@ -951,6 +953,33 @@ static void rte_eal_init_alert(const char *msg) > RTE_LOG(ERR, EAL, "%s\n", msg); > } > > +/* > + * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the > + * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel > + * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore, > + * checking if the path is empty will tell us if IOMMU is enabled. > + */ > +static bool > +is_iommu_enabled(void) > +{ > + DIR *dir = opendir(KERNEL_IOMMU_GROUPS_PATH); > + struct dirent *d; > + int n = 0; > + > + /* if directory doesn't exist, assume IOMMU is not enabled */ > + if (dir == NULL) > + return false; > + > + while ((d = readdir(dir)) != NULL) { > + /* skip dot and dot-dot */ > + if (++n > 2) > + break; > + } > + closedir(dir); > + > + return n > 2; > +} > + > /* Launch threads, called at application init(). */ > int > rte_eal_init(int argc, char **argv) > @@ -1061,8 +1090,25 @@ rte_eal_init(int argc, char **argv) > enum rte_iova_mode iova_mode = rte_bus_get_iommu_class(); > > if (iova_mode == RTE_IOVA_DC) { > - iova_mode = RTE_IOVA_VA; > - RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n"); > + RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n"); > + > + if (!phys_addrs) { > + /* if we have no access to physical addresses, > + * pick IOVA as VA mode. > + */ > + iova_mode = RTE_IOVA_VA; > + RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n"); > + } else if (is_iommu_enabled()) { > + /* we have an IOMMU, pick IOVA as VA mode */ > + iova_mode = RTE_IOVA_VA; > + RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n"); > + } else { > + /* physical addresses available, and no IOMMU > + * found, so pick IOVA as PA. > + */ > + iova_mode = RTE_IOVA_PA; > + RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n"); > + } > } > #ifdef RTE_LIBRTE_KNI > /* Workaround for KNI which requires physical address to work */ > -- > 2.17.1 Reviewed-by: David Marchand -- David Marchand