From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3336CA0487 for ; Mon, 29 Jul 2019 11:31:37 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 7052D1BF07; Mon, 29 Jul 2019 11:31:36 +0200 (CEST) Received: from mail-vk1-f194.google.com (mail-vk1-f194.google.com [209.85.221.194]) by dpdk.org (Postfix) with ESMTP id 8BAD01BF06 for ; Mon, 29 Jul 2019 11:31:35 +0200 (CEST) Received: by mail-vk1-f194.google.com with SMTP id e83so11826213vke.12 for ; Mon, 29 Jul 2019 02:31:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4vKGyKRyMsZDH4GvCdzREtRExHqofpGbHy5oNzpl8Go=; b=Uoo+u60wQ9hXhd1lFYMAkPmYLLt1GRIZeshda9c9Ovu9Dsqqwo0gXP9g7Lwo6lKgGP W77jgOJmD3N253OCUgTYlSboUfZDnbmNlkC2kHpgEHuKyzd58HU1cYae1erYCnVbezbY dsRLNyKrM/UZiOXn59km59Cr6KbOHKnjiM0Nup4i+o68a6k+HzideSnmR333oPg8TbUY BsEPgTm1s9bBk/51WH5T2PyySmHhJda1PAV3LInEVbtdBkg7km7l6nmWQFeNdGO0+Pb1 YZnBFlX80G52lvkULYQG9fySqzUgFAL/oOA+4SX0n4rcyHS6ogaYQNiDEBcJVcGFM+CT Nasg== X-Gm-Message-State: APjAAAU7msrNJJMeEqfvc3JZp2okmt0+J+cpyJa+B1bN1eF7wg11AcZ3 Nc37q6KWSdcnQT+3odcnPf7E5EViVO7yzzPL0IbafA== X-Google-Smtp-Source: APXvYqyrcVCzxTUz0x4T+rJTVHzYbMen0LFbxpIbofM/xosMgR7mABROrFBc+VmZ2gy/v4A0aveTn4SDp/ZEzgyb21s= X-Received: by 2002:a1f:50c1:: with SMTP id e184mr40723442vkb.86.1564392694723; Mon, 29 Jul 2019 02:31:34 -0700 (PDT) MIME-Version: 1.0 References: <8b7148c100678ee2f4cd9b94168b006e851fe6ef.1564052753.git.anatoly.burakov@intel.com> In-Reply-To: From: David Marchand Date: Mon, 29 Jul 2019 11:31:23 +0200 Message-ID: To: Anatoly Burakov Cc: dev , John McNamara , Marko Kovacevic , dariusz.stojaczyk@intel.com, Thomas Monjalon , Jerin Jacob Kollanukkaran Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-dev] [PATCH v4] eal: pick IOVA as PA if IOMMU is not available X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Jul 26, 2019 at 5:37 PM Anatoly Burakov wrote: > > When IOMMU is not available, /sys/kernel/iommu_groups will not be > populated. This is happening since at least 3.6 when VFIO support > was added. If the directory is empty, EAL should not pick IOVA as > VA as the default IOVA mode. > > Signed-off-by: Anatoly Burakov > Tested-by: Darek Stojaczyk > Tested-by: Jerin Jacob > Reviewed-by: Jerin Jacob > --- > > Notes: > v4: > - Fix indentation in release notes' known issues > > v3: > - Add documentation changes > - Fix a typo pointed out by checkpatch > > v2: > - Decouple IOMMU from VFIO > - Add a check for physical addresses availability > > .../prog_guide/env_abstraction_layer.rst | 27 +++++++++++------ > doc/guides/rel_notes/known_issues.rst | 26 ++++++++++++++++ > doc/guides/rel_notes/release_19_08.rst | 16 ++++++++++ > lib/librte_eal/linux/eal/eal.c | 21 +++++++++++-- > lib/librte_eal/linux/eal/eal_vfio.c | 30 +++++++++++++++++++ > lib/librte_eal/linux/eal/eal_vfio.h | 2 ++ > 6 files changed, 111 insertions(+), 11 deletions(-) > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst > index 1487ea550..e6e70e5a8 100644 > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > @@ -425,6 +425,9 @@ IOVA Mode Detection > IOVA Mode is selected by considering what the current usable Devices on the > system require and/or support. > > +On FreeBSD, RTE_IOVA_VA mode is not supported, so RTE_IOVA_PA is always used. We still allow setting it via --iova-mode= Is it really unsupported ? vdev like rings could work. > +On Linux, the IOVA mode is detected based on a heuristic. > + > Below is the 2-step heuristic for this choice. We can combine those two sentences as a single one. > > For the first step, EAL asks each bus its requirement in terms of IOVA mode > @@ -438,20 +441,26 @@ and decides on a preferred IOVA mode. > RTE_IOVA_VA), then the preferred IOVA mode is RTE_IOVA_DC (see below with the > check on Physical Addresses availability), > > +If the buses have expressed no preference on which IOVA mode to pick, then a > +default is selected using the following logic: > + > +- if physical addresses are not available, RTE_IOVA_VA mode is used > +- if /sys/kernel/iommu_groups is not empty, RTE_IOVA_VA mode is used > +- otherwise, RTE_IOVA_PA mode is used > + > +In the case when the buses had disagreed on their preferred IOVA mode, part of > +the buses won't work because of this decision. > + > The second step checks if the preferred mode complies with the Physical > Addresses availability since those are only available to root user in recent > -kernels. > - > -- if the preferred mode is RTE_IOVA_PA but there is no access to Physical > - Addresses, then EAL init fails early, since later probing of the devices > - would fail anyway, > -- if the preferred mode is RTE_IOVA_DC then EAL selects the RTE_IOVA_VA mode. > - In the case when the buses had disagreed on the IOVA Mode at the first step, > - part of the buses won't work because of this decision. > +kernels. Namely, if the preferred mode is RTE_IOVA_PA but there is no access to > +Physical Addresses, then EAL init fails early, since later probing of the > +devices would fail anyway. > > .. note:: > > - The RTE_IOVA_VA mode is selected as the default for the following reasons: > + The RTE_IOVA_VA mode is preferred as the default in most cases for the > + following reasons: > > - All drivers are expected to work in RTE_IOVA_VA mode, irrespective of > physical address availability. > diff --git a/doc/guides/rel_notes/known_issues.rst b/doc/guides/rel_notes/known_issues.rst > index 276327c15..0b50c8306 100644 > --- a/doc/guides/rel_notes/known_issues.rst > +++ b/doc/guides/rel_notes/known_issues.rst > @@ -861,3 +861,29 @@ AVX-512 support disabled > > **Driver/Module**: > ALL. > + > + > +Unsuitable IOVA mode may be picked as the default > +---------------------------------------------------------------- > +**Description** > + Not all kernel drivers and not all devices support all IOVA modes. EAL will > + attempt to pick a reasonable default based on a number of factors, but there > + may be cases where the default may be unsuitable (for example, hotplugging > + devices using `igb_uio` driver while having picked IOVA as VA mode on EAL > + initialization). > + > +**Implication** > + Some devices (hotplugged or otherwise) may not work due to incompatible IOVA > + mode being automatically picked by EAL. > + > +**Resolution/Workaround**: > + It is possible to force EAL to pick a particular IOVA mode by using the > + `--iova-mode` command-line parameter. If conflicting requirements are present > + (such as one device requiring IOVA as PA and one requiring IOVA as VA mode), > + there is no workaround. > + > +**Affected Environment/Platform**: > + Linux. > + > +**Driver/Module**: > + ALL. > diff --git a/doc/guides/rel_notes/release_19_08.rst b/doc/guides/rel_notes/release_19_08.rst > index c9bd3ce18..b399ca536 100644 > --- a/doc/guides/rel_notes/release_19_08.rst > +++ b/doc/guides/rel_notes/release_19_08.rst > @@ -56,6 +56,12 @@ New Features > Also, make sure to start the actual text at the margin. > ========================================================= > > +* **EAL will now pick IOVA as VA mode as the default in most cases.** > + > + Previously, preferred default IOVA mode was selected to be IOVA as PA. The > + behavior has now been changed to handle IOVA mode detection in a more complex > + manner, and will default to IOVA as VA in most cases. > + > * **Added MCS lock.** > > MCS lock provides scalability by spinning on a CPU/thread local variable > @@ -436,6 +442,16 @@ Known Issues > ========================================================= > > > +* **Unsuitable IOVA mode may be picked as the default** > + > + Not all kernel drivers and not all devices support all IOVA modes. EAL will > + attempt to pick a reasonable default based on a number of factors, but > + there may be cases where the default may be unsuitable. > + > + It is recommended to use the `--iova-mode` command-line parameter if the > + default is not suitable. > + > + > Tested Platforms > ---------------- > > diff --git a/lib/librte_eal/linux/eal/eal.c b/lib/librte_eal/linux/eal/eal.c > index 34db78753..29972b896 100644 > --- a/lib/librte_eal/linux/eal/eal.c > +++ b/lib/librte_eal/linux/eal/eal.c > @@ -1061,8 +1061,25 @@ rte_eal_init(int argc, char **argv) > enum rte_iova_mode iova_mode = rte_bus_get_iommu_class(); > > if (iova_mode == RTE_IOVA_DC) { > - iova_mode = RTE_IOVA_VA; > - RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode, select IOVA as VA mode.\n"); > + RTE_LOG(DEBUG, EAL, "Buses did not request a specific IOVA mode.\n"); > + > + if (!phys_addrs) { > + /* if we have no access to physical addresses, > + * pick IOVA as VA mode. > + */ > + iova_mode = RTE_IOVA_VA; > + RTE_LOG(DEBUG, EAL, "Physical addresses are unavailable, selecting IOVA as VA mode.\n"); > + } else if (vfio_iommu_enabled()) { How about: s/vfio_iommu_enabled/is_iommu_available/ And the code would move from vfio specific files to eal.c. > + /* we have an IOMMU, pick IOVA as VA mode */ > + iova_mode = RTE_IOVA_VA; > + RTE_LOG(DEBUG, EAL, "IOMMU is available, selecting IOVA as VA mode.\n"); > + } else { > + /* physical addresses available, and no IOMMU > + * found, so pick IOVA as PA. > + */ > + iova_mode = RTE_IOVA_PA; > + RTE_LOG(DEBUG, EAL, "IOMMU is not available, selecting IOVA as PA mode.\n"); > + } > } > #ifdef RTE_LIBRTE_KNI > /* Workaround for KNI which requires physical address to work */ > diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c > index 501c74f23..92d290284 100644 > --- a/lib/librte_eal/linux/eal/eal_vfio.c > +++ b/lib/librte_eal/linux/eal/eal_vfio.c > @@ -2,6 +2,7 @@ > * Copyright(c) 2010-2018 Intel Corporation > */ > > +#include > #include > #include > #include > @@ -19,6 +20,8 @@ > #include "eal_vfio.h" > #include "eal_private.h" > > +#define VFIO_KERNEL_IOMMU_GROUPS_PATH "/sys/kernel/iommu_groups" > + > #ifdef VFIO_PRESENT > > #define VFIO_MEM_EVENT_CLB_NAME "vfio_mem_event_clb" > @@ -2147,3 +2150,30 @@ rte_vfio_container_dma_unmap(__rte_unused int container_fd, > } > > #endif /* VFIO_PRESENT */ > + > +/* > + * on Linux 3.6+, even if VFIO is not loaded, whenever IOMMU is enabled in the > + * BIOS and in the kernel, /sys/kernel/iommu_groups path will contain kernel > + * IOMMU groups. If IOMMU is not enabled, that path would be empty. Therefore, > + * checking if the path is empty will tell us if IOMMU is enabled. > + */ > +int > +vfio_iommu_enabled(void) > +{ > + DIR *dir = opendir(VFIO_KERNEL_IOMMU_GROUPS_PATH); > + struct dirent *d; > + int n = 0; > + > + /* if directory doesn't exist, assume IOMMU is not enabled */ > + if (dir == NULL) > + return 0; > + > + while ((d = readdir(dir)) != NULL) { > + /* skip dot and dot-dot */ > + if (++n > 2) > + break; > + } > + closedir(dir); > + > + return n > 2; > +} > diff --git a/lib/librte_eal/linux/eal/eal_vfio.h b/lib/librte_eal/linux/eal/eal_vfio.h > index cb2d35fb1..58c7a7309 100644 > --- a/lib/librte_eal/linux/eal/eal_vfio.h > +++ b/lib/librte_eal/linux/eal/eal_vfio.h > @@ -133,6 +133,8 @@ vfio_has_supported_extensions(int vfio_container_fd); > > int vfio_mp_sync_setup(void); > > +int vfio_iommu_enabled(void); > + > #define EAL_VFIO_MP "eal_vfio_mp_sync" > > #define SOCKET_REQ_CONTAINER 0x100 > -- > 2.17.1 -- David Marchand