From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0211EA00E6 for ; Wed, 10 Jul 2019 10:10:04 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 67C112BF4; Wed, 10 Jul 2019 10:10:03 +0200 (CEST) Received: from mail-vk1-f193.google.com (mail-vk1-f193.google.com [209.85.221.193]) by dpdk.org (Postfix) with ESMTP id C513814E8 for ; Wed, 10 Jul 2019 10:10:02 +0200 (CEST) Received: by mail-vk1-f193.google.com with SMTP id 9so299001vkw.4 for ; Wed, 10 Jul 2019 01:10:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WtVLepOW4g1J/zix0sKTr/OszasbY0fsAFtu+vHe7KE=; b=NsK/QCOd4vmln4gc0ul1nr+te4yEykvcwY1t7o17MBfEeQkbaL8QhTEZ/5NGW6nEd9 lMPQWL/xY7MvAvIAw8QJ9QrFM9cpmk2s/xQp9F/Uu8u2yX4LE6RG05kxF0rIEJYvVSi1 ppT3FLEdR8ggI9m4GRrQVKLBheG6k15FFwBjH2T/75tYoW1c9najFHI0ECx5CnKgw7l4 rsElUga4DdaOd8bnEducbM2H5T9skrRd2ncHFPpWQkRhGOl6jU4K+97xH19m/pqwurEz Pd1Mt+p6fnmmdA0ZWpLyfRYiQ1UM7c7bJihjrxjxlZLoRMUQJMpVzHq41JjZIE78OitM xW6g== X-Gm-Message-State: APjAAAXTdJabsHq7p+Rt6y8OFGPVu9RsZj8TXPG6NFB06G0gd16XkjOm kjDRLPd7E9syNBmqD5ctI9jU1kn4W1mbyBKQZFQyFA== X-Google-Smtp-Source: APXvYqwsvG7d8uLxs14zxgVbNcRrEIRGDeRItAxV1vDl9kRUnwAkY6zGXtUrz0LnV9Jk1ooCyJooeOZtA34KNSou5P4= X-Received: by 2002:a1f:c18e:: with SMTP id r136mr9527758vkf.53.1562746201917; Wed, 10 Jul 2019 01:10:01 -0700 (PDT) MIME-Version: 1.0 References: <20190708142450.51597-1-jerinj@marvell.com> <0947c33d-b3be-1acc-f98e-3635cc5658d2@intel.com> <553b3a91-7458-98d0-9df6-5b53010d326f@intel.com> In-Reply-To: From: David Marchand Date: Wed, 10 Jul 2019 10:09:50 +0200 Message-ID: To: Jerin Jacob Kollanukkaran , "Burakov, Anatoly" Cc: dev , Thomas Monjalon , Ben Walker Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: Re: [dpdk-dev] [EXT] Re: [PATCH] bus/pci: fix IOVA as VA mode selection X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hello guys, On Tue, Jul 9, 2019 at 7:52 PM Jerin Jacob Kollanukkaran wrote: > > -----Original Message----- > > From: Burakov, Anatoly > > Sent: Tuesday, July 9, 2019 8:07 PM > > To: Jerin Jacob Kollanukkaran ; David Marchand > > > > Cc: dev ; Thomas Monjalon ; Ben > > Walker > > Subject: Re: [EXT] Re: [dpdk-dev] [PATCH] bus/pci: fix IOVA as VA mode > > selection > issue. > > >> > > >> I wouldn't classify this as "needing" IOVA. "Need" implies it cannot > > >> work without it, whereas in this case it's more of a "highly > > >> recommended" rather than "need". > > > > > > It is "need" as performance is horrible without it as is per packet SW > > translation. > > > A "need" for DPDK performance perspective. > > > > Would the driver fail to initialize if it detects running as IOVA as PA? > > Yes. > https://git.dpdk.org/dpdk/tree/drivers/net/octeontx2/otx2_ethdev.c#n1191 > > > Also, some other use cases will also require IOVA as PA while having full > > IOMMU support. An example of this would be systems with limited IOMMU > > width (such as VM's) - even though the IOMMU is technically supported, we > > may not have the necessary address width to run all devices in IOVA as VA > > mode, and would need to fall back to IOVA as PA. > > Since we cannot *require* IOVA as VA in current codebase, any driver that > > expects IOVA as VA to always be enabled will presumably not work. > > > > > > > > Again, it is not device attribute, it is system attribute. > > > > If it's a system attribute, why is it a device driver flag then? The > system may > > or may not support IOMMU, the device itself probably doesn't care since > bus > > address looks the same in both cases, *but the driver > > might* (such as would be in your case - requiring IOVA as VA and > disallowing > > IOVA as PA for performance reasons). > > Agree. > > > > > Currently (again, disregarding your interpretation of how IOVA as VA > works > > and looking at the actual commit history), we always seem to imply that > IOVA > > as PA works for all devices, and we use IOVA_AS_VA flag to indicate that > the > > device *also* supports IOVA as VA mode. > > > > But we don't have any way to express a *requirement* for IOVA as VA mode > > - only for IOVA as PA mode. That is the purpose of the new flag. You are > > stating that the IOVA_AS_VA drv flag is an expression of that > requirement, > > but that is not reflected in the codebase - our commit history indicates > that > > we don't treat IOVA as VA as hard requirement whenever this flag is > > specified (and i would argue that we shouldn't). > > No objection to further classify it. > I propose to introduce: * RTE_PCI_DRV_IOVA_AS_PA which means "the combination of the pmd+kmod+hw supports usage of Physical Addresses" * RTE_PCI_DRV_IOVA_AS_VA which means "the combination of the pmd+kmod+hw supports usage of Virtual Addresses" - For the pci bus, the algorigthm would be: devices_want_pa = false devices_want_va = false Foreach pci device Skip blacklisted devices Skip unbound devices (i.e. we only consider devices bound to a known kernel driver) Skip unsupported devices (i.e. we only consider devices that have a pmd that supports them) If the combination pmd+kmod only supports VA (RTE_PCI_DRV_IOVA_AS_VA capability in driver flags), then devices_want_va = true Else if the combination pmd+kmod only supports PA (RTE_PCI_DRV_IOVA_AS_PA capability in driver flags), then devices_want_pa = true If devices_want_va and !devices_want_pa return RTE_IOVA_VA If devices_want_pa and !devices_want_va return RTE_IOVA_PA return RTE_IOVA_DC Notes: * the IOMMU limitations are considered as a per device/driver thing, since the kmod is the one that configures the system IOMMU, * the case "devices_want_pa and devices_want_va" is considered as DC, we leave EAL decide based on the physical addresses availability because we can't comply with all present devices/drivers in the system. This means that at bus probe time for a device, we must add a check that the combination is fulfilled (and avoid this check in the drivers themselves). - For the global bus code, that aggregates the different buses preferences, we need to do the same, while I suspect a bug at the moment. The algorigthm: buses_want_pa = false buses_want_va = false Foreach bus If the bus reports RTE_IOVA_VA, then buses_want_va = true Else if the bus reports RTE_IOVA_PA, then buses_want_pa = true If buses_want_va and !buses_want_pa return RTE_IOVA_VA If buses_want_pa and !buses_want_va return RTE_IOVA_PA return RTE_IOVA_DC - Finally at EAL level, we keep the current code. Hope I did not miss anything. If we agree on this, I will send the changes and an update in the documentation. -- David Marchand