From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 03572A04BA; Wed, 7 Oct 2020 14:49:40 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 8F0061BAA7; Wed, 7 Oct 2020 14:49:38 +0200 (CEST) Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by dpdk.org (Postfix) with ESMTP id 4F6AA1B690 for ; Wed, 7 Oct 2020 14:49:35 +0200 (CEST) Received: from compute2.internal (compute2.nyi.internal [10.202.2.42]) by mailout.west.internal (Postfix) with ESMTP id 1BE5E9EF; Wed, 7 Oct 2020 08:49:33 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute2.internal (MEProxy); Wed, 07 Oct 2020 08:49:33 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=monjalon.net; h= from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding:content-type; s=fm2; bh= YzjfVG8flTf3FG7GEXykpSUjmo9R2CjftgpDrkTR7bs=; b=kjjwk7iD28Y8ltrJ 6tultSHsixz6eUEJpHlD9kUgMbh82ygAShV/bfTxpcvhGd3In5lEJ4hVp6pFly1r wFmPbAB+T5Wk9IEya8LjPfaa7aHGSUJIHIhZNfb4cBZsC8hAo1TcYXghFOurpdQ0 fr9yFgyvCIU2MbwAazplNnwQr16U1LQh2L2oEAaOJx7tLDRZLGMWRPbmHS6ogCnG n1ATRe4Z6Rlaqkl0bGGTsnkeoYiGI7sjUkOO7viK2nI7mOg7NEDQQkz30iBBPy1g QUtaeT8tmkCegPU3stlfMiqDyTWz2tV8rM9uenaxVBQVFA50RBwcT1Rf/TOPy86W jrDNpw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm1; bh=YzjfVG8flTf3FG7GEXykpSUjmo9R2CjftgpDrkTR7 bs=; b=ZEnAyoVCckHFu46vppipvzqb/yqdaLZahYUMKCt7NaMyi2CgjCpDgk2lD hcxKyo1cgqZz9I1Jf30lM3WVRL7I+9B56VWzfdxF/u1zkEA++sTV7b+8MvCXdvT+ RJE4LFFl3/5GL5fh2NUlPbT/igV+ikLemS4Bhfp5HA3PgI7Mb2xBLrhXUG2tC6wj mPWaMiownIj9TsxLEJKI4tnxbLFjA+DDICJB/8m6dGrtD4rjoTjEKV4LSxN9G7l3 0PMRqSHTbMzCQbkLWUhSz48lKDtAzwJ3c3KD3mgKLzGwG3y9a0vKyCw0vkJ/7x8l 5auEvRNnoDidfXmykm8p/rPeZDF1A== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrgeeigdehjecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvffufffkjghfggfgtgesthfuredttddtvdenucfhrhhomhepvfhhohhmrghs ucfoohhnjhgrlhhonhcuoehthhhomhgrshesmhhonhhjrghlohhnrdhnvghtqeenucggtf frrghtthgvrhhnpedugefgvdefudfftdefgeelgffhueekgfffhfeujedtteeutdejueei iedvffegheenucfkphepjeejrddufeegrddvtdefrddukeegnecuvehluhhsthgvrhfuih iivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepthhhohhmrghssehmohhnjhgrlhho nhdrnhgvth X-ME-Proxy: Received: from xps.localnet (184.203.134.77.rev.sfr.net [77.134.203.184]) by mail.messagingengine.com (Postfix) with ESMTPA id 985A03064683; Wed, 7 Oct 2020 08:49:31 -0400 (EDT) From: Thomas Monjalon To: David Christensen Cc: david.marchand@redhat.com, dev@dpdk.org, "Burakov, Anatoly" Date: Wed, 07 Oct 2020 14:49:30 +0200 Message-ID: <1611210.Cl7YQ8O76l@thomas> In-Reply-To: <2c830988-c4db-7bdc-50f3-3fa445a81673@intel.com> References: <20200630213823.1583764-1-drc@linux.vnet.ibm.com> <20200810210707.745083-2-drc@linux.vnet.ibm.com> <2c830988-c4db-7bdc-50f3-3fa445a81673@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Subject: Re: [dpdk-dev] [PATCH v3 1/1] vfio: modify spapr iommu support to use static window sizing X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi David, Do you plan to send a v4? 17/09/2020 13:13, Burakov, Anatoly: > On 10-Aug-20 10:07 PM, David Christensen wrote: > > The SPAPR IOMMU requires that a DMA window size be defined before memory > > can be mapped for DMA. Current code dynamically modifies the DMA window > > size in response to every new memory allocation which is potentially > > dangerous because all existing mappings need to be unmapped/remapped in > > order to resize the DMA window, leaving hardware holding IOVA addresses > > that are temporarily unmapped. The new SPAPR code statically assigns > > the DMA window size on first use, using the largest physical memory > > memory address when IOVA=PA and the highest existing memseg virtual > > address when IOVA=VA. > > > > Signed-off-by: David Christensen > > --- > > > > > +struct spapr_size_walk_param { > > + uint64_t max_va; > > + uint64_t page_sz; > > + int external; > > +}; > > + > > +/* > > + * In order to set the DMA window size required for the SPAPR IOMMU > > + * we need to walk the existing virtual memory allocations as well as > > + * find the hugepage size used. > > + */ > > static int > > -vfio_spapr_unmap_walk(const struct rte_memseg_list *msl, > > - const struct rte_memseg *ms, void *arg) > > +vfio_spapr_size_walk(const struct rte_memseg_list *msl, void *arg) > > { > > - int *vfio_container_fd = arg; > > + struct spapr_size_walk_param *param = arg; > > + uint64_t max = (uint64_t) msl->base_va + (uint64_t) msl->len; > > > > - /* skip external memory that isn't a heap */ > > - if (msl->external && !msl->heap) > > - return 0; > > + if (msl->external) { > > + param->external++; > > + if (!msl->heap) > > + return 0; > > + } > > It would be nice to have some comments in the code explaining what we're > skipping and why. > > Also, seems that you're using param->external as bool? This is a > non-public API so using stdbool is not an issue here, perhaps replace it > with bool param->has_external? > > > > > - /* skip any segments with invalid IOVA addresses */ > > - if (ms->iova == RTE_BAD_IOVA) > > - return 0; > > + if (max > param->max_va) { > > + param->page_sz = msl->page_sz; > > + param->max_va = max; > > + } > > > > - return vfio_spapr_dma_do_map(*vfio_container_fd, ms->addr_64, ms->iova, > > - ms->len, 0); > > + return 0; > > } > > > > -struct spapr_walk_param { > > - uint64_t window_size; > > - uint64_t hugepage_sz; > > -}; > > - > > +/* > > + * The SPAPRv2 IOMMU supports 2 DMA windows with starting > > + * address at 0 or 1<<59. By default, a DMA window is set > > + * at address 0, 2GB long, with a 4KB page. For DPDK we > > + * must remove the default window and setup a new DMA window > > + * based on the hugepage size and memory requirements of > > + * the application before we can map memory for DMA. > > + */ > > static int > > -vfio_spapr_window_size_walk(const struct rte_memseg_list *msl, > > - const struct rte_memseg *ms, void *arg) > > +spapr_dma_win_size(void) > > { > > - struct spapr_walk_param *param = arg; > > - uint64_t max = ms->iova + ms->len; > > + struct spapr_size_walk_param param; > > > > - /* skip external memory that isn't a heap */ > > - if (msl->external && !msl->heap) > > + /* only create DMA window once */ > > + if (spapr_dma_win_len > 0) > > return 0; > > > > - /* skip any segments with invalid IOVA addresses */ > > - if (ms->iova == RTE_BAD_IOVA) > > - return 0; > > + /* walk the memseg list to find the page size/max VA address */ > > + memset(¶m, 0, sizeof(param)); > > + if (rte_memseg_list_walk(vfio_spapr_size_walk, ¶m) < 0) { > > + RTE_LOG(ERR, EAL, "Failed to walk memseg list for DMA " > > + "window size\n"); > > + return -1; > > + } > > + > > + /* We can't be sure if DMA window covers external memory */ > > + if (param.external > 0) > > + RTE_LOG(WARNING, EAL, "Detected external memory which may " > > + "not be managed by the IOMMU\n"); > > + > > + /* find the maximum IOVA address for setting the DMA window size */ > > + if (rte_eal_iova_mode() == RTE_IOVA_PA) { > > + static const char proc_iomem[] = "/proc/iomem"; > > + static const char str_sysram[] = "System RAM"; > > + uint64_t start, end, max = 0; > > + char *line = NULL; > > + char *dash, *space; > > + size_t line_len; > > + > > + /* > > + * Example "System RAM" in /proc/iomem: > > + * 00000000-1fffffffff : System RAM > > + * 200000000000-201fffffffff : System RAM > > + */ > > + FILE *fd = fopen(proc_iomem, "r"); > > + if (fd == NULL) { > > + RTE_LOG(ERR, EAL, "Cannot open %s\n", proc_iomem); > > + return -1; > > + } > > + /* Scan /proc/iomem for the highest PA in the system */ > > + while (getline(&line, &line_len, fd) != -1) { > > + if (strstr(line, str_sysram) == NULL) > > + continue; > > + > > + space = strstr(line, " "); > > + dash = strstr(line, "-"); > > + > > + /* Validate the format of the memory string */ > > + if (space == NULL || dash == NULL || space < dash) { > > + RTE_LOG(ERR, EAL, "Can't parse line \"%s\" in " > > + "file %s\n", line, proc_iomem); > > + continue; > > + } > > + > > + start = strtoull(line, NULL, 16); > > + end = strtoull(dash + 1, NULL, 16); > > + RTE_LOG(DEBUG, EAL, "Found system RAM from 0x%" > > + PRIx64 " to 0x%" PRIx64 "\n", start, end); > > + if (end > max) > > + max = end; > > + } > > + free(line); > > + fclose(fd); > > I would've put all of this file reading business into a separate > function, as otherwise it's a bit hard to follow the mix of file ops and > using the results. Something like > > value = get_value_from_iomem(); > if (value > ...) > ... > > is much easier on the eyes :) > > > > > - if (max > param->window_size) { > > - param->hugepage_sz = ms->hugepage_sz; > > - param->window_size = max; > > + if (max == 0) { > > + RTE_LOG(ERR, EAL, "Failed to find valid \"System RAM\" " > > + "entry in file %s\n", proc_iomem); > > + return -1; > > + } > > + > > + spapr_dma_win_len = rte_align64pow2(max + 1); > > + RTE_LOG(DEBUG, EAL, "Setting DMA window size to 0x%" > > + PRIx64 "\n", spapr_dma_win_len); > > + } else if (rte_eal_iova_mode() == RTE_IOVA_VA) { > > + RTE_LOG(DEBUG, EAL, "Highest VA address in memseg list is 0x%" > > + PRIx64 "\n", param.max_va); > > + spapr_dma_win_len = rte_align64pow2(param.max_va); > > + RTE_LOG(DEBUG, EAL, "Setting DMA window size to 0x%" > > + PRIx64 "\n", spapr_dma_win_len); > > + } else { > > + RTE_LOG(ERR, EAL, "Unsupported IOVA mode\n"); > > + return -1; > > } > > > > + spapr_dma_win_page_sz = param.page_sz; > > + rte_mem_set_dma_mask(__builtin_ctzll(spapr_dma_win_len)); > > return 0; > > } > > > > static int > > -vfio_spapr_create_new_dma_window(int vfio_container_fd, > > - struct vfio_iommu_spapr_tce_create *create) { > > +vfio_spapr_create_dma_window(int vfio_container_fd) > > +{ > > + struct vfio_iommu_spapr_tce_create create = { > > + .argsz = sizeof(create), }; > > struct vfio_iommu_spapr_tce_remove remove = { > > - .argsz = sizeof(remove), > > - }; > > + .argsz = sizeof(remove), }; > > struct vfio_iommu_spapr_tce_info info = { > > - .argsz = sizeof(info), > > - }; > > + .argsz = sizeof(info), }; > > int ret; > > > > - /* query spapr iommu info */ > > + ret = spapr_dma_win_size(); > > + if (ret < 0) > > + return ret; > > + > > ret = ioctl(vfio_container_fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info); > > if (ret) { > > - RTE_LOG(ERR, EAL, " cannot get iommu info, " > > - "error %i (%s)\n", errno, strerror(errno)); > > Here and in other similar places, no need to split strings into multiline. > > Overall, since these changes are confined to PPC64 i can't really test > these, but with the above changes: > > Reviewed-by: Anatoly Burakov > >