I have tested the MAP_LOCKED, it doesn't help in this case. I do intend to report to the kernel but was wondering if others have hit upon this first.

On Tue, May 30, 2023 at 4:35 AM Stephen Hemminger <stephen@networkplumber.org> wrote:
On Sun, 28 May 2023 23:07:40 +0300
Baruch Even <baruch@weka.io> wrote:

> Hi,
>
> We found an issue with newer kernels (5.13+) that are found on newer OSes
> (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that was
> allocated for DPDK was migrated (moved into another physical page) when a
> 1G page was allocated.
>
> From our reading of the kernel commits this started with commit
> ae37c7ff79f1f030e28ec76c46ee032f8fd07607
>     mm: make alloc_contig_range handle in-use hugetlb pages
>
> This caused what looked like memory corruptions to us and cases where the
> rings were moved from their physical location and communication was no
> longer possible.
>
> I wanted to ask if anyone else hit this issue and what mitigations are
> available?
>
> We are currently looking at using a kernel driver to pin the pages but I
> expect that this issue will affect others and that a more general approach
> is needed.
>
> Thanks,
> Baruch
>

Fix might be as simple as asking kernel to lock the mmap().

diff --git a/lib/eal/linux/eal_hugepage_info.c b/lib/eal/linux/eal_hugepage_info.c
index 581d9dfc91eb..989c69387233 100644
--- a/lib/eal/linux/eal_hugepage_info.c
+++ b/lib/eal/linux/eal_hugepage_info.c
@@ -48,7 +48,8 @@ map_shared_memory(const char *filename, const size_t mem_size, int flags)
                return NULL;
        }
        retval = mmap(NULL, mem_size, PROT_READ | PROT_WRITE,
-                       MAP_SHARED, fd, 0);
+                       MAP_SHARED_VALIDATE | MAP_LOCKED, fd, 0);
+
        close(fd);
        return retval == MAP_FAILED ? NULL : retval;
 }


--
Baruch Even
Platform Technical Lead,  WEKA
baruch@weka.io ­www.weka.io ­  ­