I have tested the MAP_LOCKED, it doesn't help in this case. I do intend to report to the kernel but was wondering if others have hit upon this first.
On Sun, 28 May 2023 23:07:40 +0300
Baruch Even <baruch@weka.io> wrote:
> Hi,
>
> We found an issue with newer kernels (5.13+) that are found on newer OSes
> (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that was
> allocated for DPDK was migrated (moved into another physical page) when a
> 1G page was allocated.
>
> From our reading of the kernel commits this started with commit
> ae37c7ff79f1f030e28ec76c46ee032f8fd07607
> mm: make alloc_contig_range handle in-use hugetlb pages
>
> This caused what looked like memory corruptions to us and cases where the
> rings were moved from their physical location and communication was no
> longer possible.
>
> I wanted to ask if anyone else hit this issue and what mitigations are
> available?
>
> We are currently looking at using a kernel driver to pin the pages but I
> expect that this issue will affect others and that a more general approach
> is needed.
>
> Thanks,
> Baruch
>
Fix might be as simple as asking kernel to lock the mmap().
diff --git a/lib/eal/linux/eal_hugepage_info.c b/lib/eal/linux/eal_hugepage_info.c
index 581d9dfc91eb..989c69387233 100644
--- a/lib/eal/linux/eal_hugepage_info.c
+++ b/lib/eal/linux/eal_hugepage_info.c
@@ -48,7 +48,8 @@ map_shared_memory(const char *filename, const size_t mem_size, int flags)
return NULL;
}
retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
- MAP_SHARED, fd, 0);
+ MAP_SHARED_VALIDATE | MAP_LOCKED, fd, 0);
+
close(fd);
return retval == MAP_FAILED ? NULL : retval;
}
--
|
|
|
| Baruch Even Platform Technical Lead, WEKA | | | | | | |
|
|