From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3792BA051C for ; Tue, 11 Feb 2020 12:34:53 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2AB073256; Tue, 11 Feb 2020 12:34:53 +0100 (CET) Received: from mail-wr1-f68.google.com (mail-wr1-f68.google.com [209.85.221.68]) by dpdk.org (Postfix) with ESMTP id 6F8073256 for ; Tue, 11 Feb 2020 12:34:52 +0100 (CET) Received: by mail-wr1-f68.google.com with SMTP id w12so11943081wrt.2 for ; Tue, 11 Feb 2020 03:34:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=cgj3FU8ToRCU2vfmISiNVePj0JLRuNNixJipadqJC4A=; b=nb0pn07G6udLTMTYtcKvxW2am3lOcB4fJvFx0D2MhRbESSjO4ColTcRXBVw4N0Aoh/ amCunN4mFj/t5FIC8u42hMY/8wi0+BK4YQ2bWThEKIqvzHj8cI+liKCQaFU5oWw8FAto 0IZR2QPpuvgFwlIikQp8LLmFIV9PrZB9fmULDW//AySf70qjjxLcM+brj5HzLYCiT9FK U0zuTwAfPUDDJ8GRn8A/p5eyLrGmPfFezj7kIYnZB0VXdqkkWf+tAdaBBf9RX+rlUOOO ht02z9ADVPrhQHYwPhmsnImw13Wgn/gkq8Z7Fw+XEQWVxSnbQUj3VKdxREYkwv70sz9w h48A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cgj3FU8ToRCU2vfmISiNVePj0JLRuNNixJipadqJC4A=; b=oFrUqTWmMreKmZB5dgK7pgj41GMcWDQm6/ivoFkydKiujSHNq98BxbLfCXiBNgYze2 D8G7CtGFDIbOitijSS5+y5Z0whN/1SLy7gTnLA1CZ14/ufs6u2rbfCU12inenM5HCk+d KPQDQKKAd1LK4Lplcqqq7NsohjwffhmHEwH+2fKlYHZ8JzN5hpAjFFy3sTvHFWRyvxZu +MNYnVy/gz7choM8z8IDOMGj6p0sILK8LMm6AZvV62PWeEuYcvoJiKL6YGysCLDkSxx1 9Vqjp65dFzfhk3z07Q4KHUYjUJSiGjt1rv4xros4IhlS1muXWLRvG9cKiz3jKXH6zn9t plMQ== X-Gm-Message-State: APjAAAVXFrhe7PPUPpl5DmTwTVyz4BjfdMuwqeBqQhdpb+4eKcJlLBbB Utt67S2w3N9OCGMtd25l0Qw= X-Google-Smtp-Source: APXvYqwJC8vmxWxpLBUF8IIItb58z5sbsXaClyuCVHy3XTlc0ZhN/+2drzzPhLnqnbHz2I2fwuTzPQ== X-Received: by 2002:adf:fa87:: with SMTP id h7mr8623667wrr.172.1581420892108; Tue, 11 Feb 2020 03:34:52 -0800 (PST) Received: from localhost ([88.98.246.218]) by smtp.gmail.com with ESMTPSA id w8sm3513586wmm.0.2020.02.11.03.34.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Feb 2020 03:34:51 -0800 (PST) From: luca.boccassi@gmail.com To: Takeshi Yoshimura Cc: David Christensen , dpdk stable Date: Tue, 11 Feb 2020 11:21:15 +0000 Message-Id: <20200211112216.3929-129-luca.boccassi@gmail.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20200211112216.3929-1-luca.boccassi@gmail.com> References: <20200211112216.3929-1-luca.boccassi@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: [dpdk-stable] patch 'vfio: fix mapping failures in ppc64le' has been queued to stable release 19.11.1 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Sender: "stable" Hi, FYI, your patch has been queued to stable release 19.11.1 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 02/13/20. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Thanks. Luca Boccassi --- >From 80e2880c9097999377da2647bce6c195320db496 Mon Sep 17 00:00:00 2001 From: Takeshi Yoshimura Date: Fri, 17 Jan 2020 13:25:55 +0900 Subject: [PATCH] vfio: fix mapping failures in ppc64le [ upstream commit 986f2134c336a086054980903e819308dcfd43ce ] ppc64le failed when using large physical memory. I found problems in my two commits in the past. In commit e072d16f8920 ("vfio: fix expanding DMA area in ppc64le"), I added a sanity check using a mapped address to resolve an issue around expanding IOMMU window, but this was not enough, since memory allocation can return memory anywhere dependent on memory fragmentation. DPDK may still skip DMA mapping and attempts to unmap non-mapped DMA during expanding IOMMU window. As a result, SPDK apps using large physical memory frequently failed to proceed the communication with NVMe and/or went into an infinite loop. The root cause of the bug was in a gap between memory segments managed by DPDK and firmware-level DMA mapping. DPDK's memory segments don't contain the state of DMA mapping, and so, the memesg_walk cannot determine if an iterated memory segment is mapped or not. This resulted in incorrect DMA maps and unmaps. At this time, I added the code to avoid iterating non-mapped memory segments during DMA mapping. The memseg_walk iterates over memory segments marked as "used", and so, the code sets memory segments that will be mapped or unmapped as "free" transiently. The commit db90b4969e2e ("vfio: retry creating sPAPR DMA window") allows retring different page levels and sizes to create DMA window. However, this allows page sizes different from hugepage sizes. This inconsistency caused failures at the time of DMA mapping after the window creation. This patch fixes to retry only different page levels. Fixes: e072d16f8920 ("vfio: fix expanding DMA area in ppc64le") Fixes: db90b4969e2e ("vfio: retry creating sPAPR DMA window") Signed-off-by: Takeshi Yoshimura Reviewed-by: David Christensen --- lib/librte_eal/linux/eal/eal_vfio.c | 76 +++++++++++++---------------- 1 file changed, 33 insertions(+), 43 deletions(-) diff --git a/lib/librte_eal/linux/eal/eal_vfio.c b/lib/librte_eal/linux/eal/eal_vfio.c index 95f615c2e3..01b5ef3f42 100644 --- a/lib/librte_eal/linux/eal/eal_vfio.c +++ b/lib/librte_eal/linux/eal/eal_vfio.c @@ -532,6 +532,17 @@ vfio_mem_event_callback(enum rte_mem_event type, const void *addr, size_t len, return; } +#ifdef RTE_ARCH_PPC_64 + ms = rte_mem_virt2memseg(addr, msl); + while (cur_len < len) { + int idx = rte_fbarray_find_idx(&msl->memseg_arr, ms); + + rte_fbarray_set_free(&msl->memseg_arr, idx); + cur_len += ms->len; + ++ms; + } + cur_len = 0; +#endif /* memsegs are contiguous in memory */ ms = rte_mem_virt2memseg(addr, msl); while (cur_len < len) { @@ -551,6 +562,17 @@ next: cur_len += ms->len; ++ms; } +#ifdef RTE_ARCH_PPC_64 + cur_len = 0; + ms = rte_mem_virt2memseg(addr, msl); + while (cur_len < len) { + int idx = rte_fbarray_find_idx(&msl->memseg_arr, ms); + + rte_fbarray_set_used(&msl->memseg_arr, idx); + cur_len += ms->len; + ++ms; + } +#endif } static int @@ -1416,16 +1438,11 @@ vfio_spapr_dma_do_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, return 0; } -struct spapr_remap_walk_param { - int vfio_container_fd; - uint64_t addr_64; -}; - static int vfio_spapr_map_walk(const struct rte_memseg_list *msl, const struct rte_memseg *ms, void *arg) { - struct spapr_remap_walk_param *param = arg; + int *vfio_container_fd = arg; /* skip external memory that isn't a heap */ if (msl->external && !msl->heap) @@ -1435,10 +1452,7 @@ vfio_spapr_map_walk(const struct rte_memseg_list *msl, if (ms->iova == RTE_BAD_IOVA) return 0; - if (ms->addr_64 == param->addr_64) - return 0; - - return vfio_spapr_dma_do_map(param->vfio_container_fd, ms->addr_64, ms->iova, + return vfio_spapr_dma_do_map(*vfio_container_fd, ms->addr_64, ms->iova, ms->len, 1); } @@ -1446,7 +1460,7 @@ static int vfio_spapr_unmap_walk(const struct rte_memseg_list *msl, const struct rte_memseg *ms, void *arg) { - struct spapr_remap_walk_param *param = arg; + int *vfio_container_fd = arg; /* skip external memory that isn't a heap */ if (msl->external && !msl->heap) @@ -1456,17 +1470,13 @@ vfio_spapr_unmap_walk(const struct rte_memseg_list *msl, if (ms->iova == RTE_BAD_IOVA) return 0; - if (ms->addr_64 == param->addr_64) - return 0; - - return vfio_spapr_dma_do_map(param->vfio_container_fd, ms->addr_64, ms->iova, + return vfio_spapr_dma_do_map(*vfio_container_fd, ms->addr_64, ms->iova, ms->len, 0); } struct spapr_walk_param { uint64_t window_size; uint64_t hugepage_sz; - uint64_t addr_64; }; static int @@ -1484,10 +1494,6 @@ vfio_spapr_window_size_walk(const struct rte_memseg_list *msl, if (ms->iova == RTE_BAD_IOVA) return 0; - /* do not iterate ms we haven't mapped yet */ - if (param->addr_64 && ms->addr_64 == param->addr_64) - return 0; - if (max > param->window_size) { param->hugepage_sz = ms->hugepage_sz; param->window_size = max; @@ -1531,20 +1537,11 @@ vfio_spapr_create_new_dma_window(int vfio_container_fd, /* try possible page_shift and levels for workaround */ uint32_t levels; - for (levels = 1; levels <= info.ddw.levels; levels++) { - uint32_t pgsizes = info.ddw.pgsizes; - - while (pgsizes != 0) { - create->page_shift = 31 - __builtin_clz(pgsizes); - create->levels = levels; - ret = ioctl(vfio_container_fd, - VFIO_IOMMU_SPAPR_TCE_CREATE, create); - if (!ret) - break; - pgsizes &= ~(1 << create->page_shift); - } - if (!ret) - break; + for (levels = create->levels + 1; + ret && levels <= info.ddw.levels; levels++) { + create->levels = levels; + ret = ioctl(vfio_container_fd, + VFIO_IOMMU_SPAPR_TCE_CREATE, create); } #endif if (ret) { @@ -1585,7 +1582,6 @@ vfio_spapr_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, /* check if window size needs to be adjusted */ memset(¶m, 0, sizeof(param)); - param.addr_64 = vaddr; /* we're inside a callback so use thread-unsafe version */ if (rte_memseg_walk_thread_unsafe(vfio_spapr_window_size_walk, @@ -1610,14 +1606,9 @@ vfio_spapr_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, if (do_map) { /* re-create window and remap the entire memory */ if (iova + len > create.window_size) { - struct spapr_remap_walk_param remap_param = { - .vfio_container_fd = vfio_container_fd, - .addr_64 = vaddr, - }; - /* release all maps before recreating the window */ if (rte_memseg_walk_thread_unsafe(vfio_spapr_unmap_walk, - &remap_param) < 0) { + &vfio_container_fd) < 0) { RTE_LOG(ERR, EAL, "Could not release DMA maps\n"); ret = -1; goto out; @@ -1644,7 +1635,7 @@ vfio_spapr_dma_mem_map(int vfio_container_fd, uint64_t vaddr, uint64_t iova, /* we're inside a callback, so use thread-unsafe version */ if (rte_memseg_walk_thread_unsafe(vfio_spapr_map_walk, - &remap_param) < 0) { + &vfio_container_fd) < 0) { RTE_LOG(ERR, EAL, "Could not recreate DMA maps\n"); ret = -1; goto out; @@ -1691,7 +1682,6 @@ vfio_spapr_dma_map(int vfio_container_fd) struct spapr_walk_param param; memset(¶m, 0, sizeof(param)); - param.addr_64 = 0UL; /* create DMA window from 0 to max(phys_addr + len) */ rte_memseg_walk(vfio_spapr_window_size_walk, ¶m); -- 2.20.1 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2020-02-11 11:17:43.198887974 +0000 +++ 0129-vfio-fix-mapping-failures-in-ppc64le.patch 2020-02-11 11:17:38.624005536 +0000 @@ -1,8 +1,10 @@ -From 986f2134c336a086054980903e819308dcfd43ce Mon Sep 17 00:00:00 2001 +From 80e2880c9097999377da2647bce6c195320db496 Mon Sep 17 00:00:00 2001 From: Takeshi Yoshimura Date: Fri, 17 Jan 2020 13:25:55 +0900 Subject: [PATCH] vfio: fix mapping failures in ppc64le +[ upstream commit 986f2134c336a086054980903e819308dcfd43ce ] + ppc64le failed when using large physical memory. I found problems in my two commits in the past. @@ -33,7 +35,6 @@ Fixes: e072d16f8920 ("vfio: fix expanding DMA area in ppc64le") Fixes: db90b4969e2e ("vfio: retry creating sPAPR DMA window") -Cc: stable@dpdk.org Signed-off-by: Takeshi Yoshimura Reviewed-by: David Christensen