Hi,
We found an issue with newer kernels (5.13+) that are found on newer OSes (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that was allocated for DPDK was migrated (moved into another physical page) when a 1G page was allocated.
From our reading of the kernel commits this started with commit ae37c7ff79f1f030e28ec76c46ee032f8fd07607
mm: make alloc_contig_range handle in-use hugetlb pages
This caused what looked like memory corruptions to us and cases where the rings were moved from their physical location and communication was no longer possible.
I wanted to ask if anyone else hit this issue and what mitigations are available?
We are currently looking at using a kernel driver to pin the pages but I expect that this issue will affect others and that a more general approach is needed.
Thanks,
Baruch
-- |
|
|
| Baruch Even Platform Technical Lead, WEKA | | | | | | |
|
|