From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4338C42BE5; Tue, 30 May 2023 15:51:51 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B427C42D0D; Tue, 30 May 2023 15:51:46 +0200 (CEST) Received: from mail-lf1-f46.google.com (mail-lf1-f46.google.com [209.85.167.46]) by mails.dpdk.org (Postfix) with ESMTP id F0FBD42D0D for ; Tue, 30 May 2023 15:51:45 +0200 (CEST) Received: by mail-lf1-f46.google.com with SMTP id 2adb3069b0e04-4f3b5881734so5085650e87.0 for ; Tue, 30 May 2023 06:51:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=weka.io; s=google; t=1685454705; x=1688046705; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=0t+1EZqZOiiEPmSE6qEgn50o1pzepd6hidkFtNf/KYs=; b=nIWCDFMZ8s49jIR/OGsaNiIbqnNFWF5cfl74XZOzbVuO8/zU673Q41Acu3w2eFfWks yGbHheWTy1xkyqYc5l4S4mdwMZ9ZVf0Ub6/QukGUqb/cR1CyTSAQZ2geZo8dPNZj8HL+ KRxgx8ik0mPSGCtwCSvRWRqDzAxze2mgVwRh4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685454705; x=1688046705; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0t+1EZqZOiiEPmSE6qEgn50o1pzepd6hidkFtNf/KYs=; b=cZ4QUBLmSutRR1VNzf9MMbKGVM2jniDuGmsKH6apTBVphdk/oU1ti6vb7baA4fHqMt QOHF9/HOVabhgYom6lJdb4orNZYF5imI1d4Pn0XGutali4/CrjXXKmO6N/g/wh6Xyhcv 4mf2k4R6w81BBM6EgKIPsHiopVjnYCvWfRjaakhIBn2WmHKo+4ko7uu4VWmUtuf5U2LU CBW/liGMsU0Wr6KXrZqhTeLxKxqq5yvH1QaVBygjvjHKzvn7jQgaP91iZ9PDNSEnsJ5u uTkpxPNyMcCX859QRseMXGguDq3LRP6Eekrxcgaekhk4ZpnvzErxqUHDk7Pit0n/k0e+ rfoQ== X-Gm-Message-State: AC+VfDzv70hfOULT5SDonzyAVEVRzUgdo4N6/ymu6yCpqJrRK1K48UrI lTqDjAKudGmvstt7/SwYD3I2dRRpwGjwVR2bQmqLowLwjG2sZVe4noWTnMPu2M0DfoQ+y0XeNmM DgKZAOiNP7vij/qWfoHA7ptdIbsje3oOlVidZeJHPJzkZZTlSJGKzb1oqcA7+3p3/oHNIyjc7QQ /6EFHNHF1fO1dS X-Google-Smtp-Source: ACHHUZ7el2S1Wg0tzcJ/E2Ib5u/BmxKw9kX5LdUnut1sc7kOOLAKZIK+MUhpBhD0t8u295q98ZHQD1p0yhfPXNeuOhU= X-Received: by 2002:ac2:4c39:0:b0:4f4:fdb4:c76d with SMTP id u25-20020ac24c39000000b004f4fdb4c76dmr714204lfq.47.1685454705224; Tue, 30 May 2023 06:51:45 -0700 (PDT) MIME-Version: 1.0 References: <20230529183514.1febf224@hermes.local> In-Reply-To: <20230529183514.1febf224@hermes.local> From: Baruch Even Date: Tue, 30 May 2023 16:51:34 +0300 Message-ID: Subject: Re: Hugepage migration To: stephen@networkplumber.org Cc: dpdk-dev Content-Type: multipart/alternative; boundary="000000000000fdc98b05fce980a1" X-CLOUD-SEC-AV-Sent: true X-CLOUD-SEC-AV-Info: weka,google_mail,monitor X-Gm-Spam: 0 X-Gm-Phishy: 0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org --000000000000fdc98b05fce980a1 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I have tested the MAP_LOCKED, it doesn't help in this case. I do intend to report to the kernel but was wondering if others have hit upon this first. On Tue, May 30, 2023 at 4:35=E2=80=AFAM Stephen Hemminger < stephen@networkplumber.org> wrote: > On Sun, 28 May 2023 23:07:40 +0300 > Baruch Even wrote: > > > Hi, > > > > We found an issue with newer kernels (5.13+) that are found on newer OS= es > > (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that was > > allocated for DPDK was migrated (moved into another physical page) when= a > > 1G page was allocated. > > > > From our reading of the kernel commits this started with commit > > ae37c7ff79f1f030e28ec76c46ee032f8fd07607 > > mm: make alloc_contig_range handle in-use hugetlb pages > > > > This caused what looked like memory corruptions to us and cases where t= he > > rings were moved from their physical location and communication was no > > longer possible. > > > > I wanted to ask if anyone else hit this issue and what mitigations are > > available? > > > > We are currently looking at using a kernel driver to pin the pages but = I > > expect that this issue will affect others and that a more general > approach > > is needed. > > > > Thanks, > > Baruch > > > > Fix might be as simple as asking kernel to lock the mmap(). > > diff --git a/lib/eal/linux/eal_hugepage_info.c > b/lib/eal/linux/eal_hugepage_info.c > index 581d9dfc91eb..989c69387233 100644 > --- a/lib/eal/linux/eal_hugepage_info.c > +++ b/lib/eal/linux/eal_hugepage_info.c > @@ -48,7 +48,8 @@ map_shared_memory(const char *filename, const size_t > mem_size, int flags) > return NULL; > } > retval =3D mmap(NULL, mem_size, PROT_READ | PROT_WRITE, > - MAP_SHARED, fd, 0); > + MAP_SHARED_VALIDATE | MAP_LOCKED, fd, 0); > + > close(fd); > return retval =3D=3D MAP_FAILED ? NULL : retval; > } > --=20 Baruch Even Platform Technical Lead, WEKA E baruch@weka.io* =C2=AD*W www.weka.io * =C2=AD* * =C2=AD* --000000000000fdc98b05fce980a1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I have tested the MAP_LOCKED, it doesn't help in this = case. I do intend to report to the kernel but was wondering if others have = hit upon this first.

On Tue, May 30, 2023 at 4:35=E2=80=AFAM Stephen Hem= minger <stephen@networkplu= mber.org> wrote:
On Sun, 28 May 2023 23:07:40 +0300
Baruch Even <baruch@= weka.io> wrote:

> Hi,
>
> We found an issue with newer kernels (5.13+) that are found on newer O= Ses
> (Ubuntu22, Rocky9, Ubuntu20 with kernel 5.15) where a 2M page that was=
> allocated for DPDK was migrated (moved into another physical page) whe= n a
> 1G page was allocated.
>
> From our reading of the kernel commits this started with commit
> ae37c7ff79f1f030e28ec76c46ee032f8fd07607
>=C2=A0 =C2=A0 =C2=A0mm: make alloc_contig_range handle in-use hugetlb p= ages
>
> This caused what looked like memory corruptions to us and cases where = the
> rings were moved from their physical location and communication was no=
> longer possible.
>
> I wanted to ask if anyone else hit this issue and what mitigations are=
> available?
>
> We are currently looking at using a kernel driver to pin the pages but= I
> expect that this issue will affect others and that a more general appr= oach
> is needed.
>
> Thanks,
> Baruch
>

Fix might be as simple as asking kernel to lock the mmap().

diff --git a/lib/eal/linux/eal_hugepage_info.c b/lib/eal/linux/eal_hugepage= _info.c
index 581d9dfc91eb..989c69387233 100644
--- a/lib/eal/linux/eal_hugepage_info.c
+++ b/lib/eal/linux/eal_hugepage_info.c
@@ -48,7 +48,8 @@ map_shared_memory(const char *filename, const size_t mem_= size, int flags)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 return NULL;
=C2=A0 =C2=A0 =C2=A0 =C2=A0 }
=C2=A0 =C2=A0 =C2=A0 =C2=A0 retval =3D mmap(NULL, mem_size, PROT_READ | PRO= T_WRITE,
-=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0MAP_SHARED, fd, 0);
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0MAP_SHARED_VALIDATE | MAP_LOCKED, fd, 0);
+
=C2=A0 =C2=A0 =C2=A0 =C2=A0 close(fd);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 return retval =3D=3D MAP_FAILED ? NULL : retval= ;
=C2=A0}


--
=
=
=
Baruch Eve= n
Platform Technical Lead,=C2= =A0 WEKA
E=C2= =A0baruch@weka.io<= i style=3D"color:rgb(255,255,255)">=E2=80=85=C2=ADW=C2=A0www= .weka.io=E2=80=85=C2=AD=C2=A0=E2=80=85=C2=AD
=
<= /table>
<= a href=3D"https://www.weka.io/lp/weka-named-a-2023-customers-choice-by-gart= ner-peer-insights/?utm_source=3Dsignature&utm_medium=3Demail" target=3D= "_blank">
3D""
--000000000000fdc98b05fce980a1--