From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7175445E5E; Mon, 9 Dec 2024 18:40:29 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 025E5402D5; Mon, 9 Dec 2024 18:40:29 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id B12F240264 for ; Mon, 9 Dec 2024 18:40:27 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1733766027; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=5MMXYP+9/97DHYXM2yiUSIqmSXQkMj2KKigYjbU3jXQ=; b=UcPsJ4vbR9rnIZRwcEbLpEt3Tm78tMwkFrtQgYu9pi3LU41VKuko4PzFrRxJ8YHWG+W2sW W2Emr2AqaC8ileSVDb3wQwhePCXuJFrO4VjxtR24oXnhetxvAO0eo9Si4GbQ6PMk2OzYDI UbLuEskagFSZ8sXlmhEaR4Pa0AisPIk= Received: from mail-lf1-f70.google.com (mail-lf1-f70.google.com [209.85.167.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-224-chVq7fLfP--U8JbJYbJW6g-1; Mon, 09 Dec 2024 12:40:23 -0500 X-MC-Unique: chVq7fLfP--U8JbJYbJW6g-1 X-Mimecast-MFC-AGG-ID: chVq7fLfP--U8JbJYbJW6g Received: by mail-lf1-f70.google.com with SMTP id 2adb3069b0e04-53e38c853a0so1594157e87.0 for ; Mon, 09 Dec 2024 09:40:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1733766022; x=1734370822; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5MMXYP+9/97DHYXM2yiUSIqmSXQkMj2KKigYjbU3jXQ=; b=WnPVnfY+8bzWkIJv9pNogl2akrQfqW92H8Nw4q8bB9ldDMsnIS3DtX8KFabrw8YEny Hvf5XBIoqZJqIXWoBOsoQla67adVN4xoia26VXeJHpPp5Vgrr0o8r5YTtfrtGuyvNlOJ N4vAp4foXTuzLLzHtuc0UOQDZzVo+ZOhdLIEmuhmC23+6/4kKJDZMY3h4cj4YovbYv8J JdvJTR89Z+XWR5VhTM6vovS3ccMwldPoQ/Moab39BGL/el+1ftZhwxdSFgpbImbjFKJE 5VcATmfIE+DLNxRLYPqCIU8DK6nSboj/e+kSCJiBE3dyqAWoOZSzK0LZy2Dfa6/0PazJ WSNQ== X-Gm-Message-State: AOJu0Yz472eEqn2VTUaeSv0wErjaqJLQR9h+bEUFt6ts0/CBas7HQvRq JH74PM8/hOy3R/sskBbrdwW8JsSpE16ycycA2wr4+r+hhOSbN8xnAlHiLgNWnZpMUmnnekAsMQL N3mqzIRQgrxY4PXQ+lDQseGQUbEtTwwznN9vO6Fq2fdNzeuKJ7ez3OjALM9oqKXlzWw3c4WFKv7 lCWBmHx1lE93Uq9DQ= X-Gm-Gg: ASbGncsEfbwfRzxhzy8a1A1jdUgDJboP6amChIYkfY6C714c+WV230V4YuPK6w9+O2u 3Ucu+DpMxLm29X2lu5tGzWXnDar4wCor0YA== X-Received: by 2002:a05:6512:1246:b0:540:22bd:e271 with SMTP id 2adb3069b0e04-540240bd6ddmr529241e87.15.1733766022129; Mon, 09 Dec 2024 09:40:22 -0800 (PST) X-Google-Smtp-Source: AGHT+IHoOi2DjaWVZ1B3uf6O2LdSU1inI6lYNrbTvj07gLr05BcRPYxOshxnX8lAJ2wevh98pMMyQcW7ah7BS+wrXHo= X-Received: by 2002:a05:6512:1246:b0:540:22bd:e271 with SMTP id 2adb3069b0e04-540240bd6ddmr529233e87.15.1733766021713; Mon, 09 Dec 2024 09:40:21 -0800 (PST) MIME-Version: 1.0 References: <20241205175754.1673888-1-david.marchand@redhat.com> In-Reply-To: From: David Marchand Date: Mon, 9 Dec 2024 18:40:09 +0100 Message-ID: Subject: Re: [PATCH 0/3] Defer lcore variables allocation To: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= Cc: dev@dpdk.org, thomas@monjalon.net, frode.nordahl@canonical.com, mattias.ronnblom@ericsson.com X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: Tm-A0wwYEdTzAqEEGzE82QxM-GYquhZP2OzCToRWqGA_1733766022 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Mon, Dec 9, 2024 at 4:39=E2=80=AFPM Mattias R=C3=B6nnblom wrote: > On 2024-12-09 12:03, David Marchand wrote: > > On Fri, Dec 6, 2024 at 12:02=E2=80=AFPM Mattias R=C3=B6nnblom wrote: > >> On 2024-12-05 18:57, David Marchand wrote: > >>> As I had reported in rc2, the lcore variables allocation have a > >>> noticeable impact on applications consuming DPDK, even when such > >>> applications does not use DPDK, or use features associated to > >>> some lcore variables. > >>> > >>> While the amount has been reduced in a rush before rc2, > >>> there are still cases when the increased memory footprint is noticed > >>> like in scaling tests. > >>> See https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2090931 > >>> > >> > >> What this bug report fails to mention is that it only affects > >> applications using locked memory. > > > > - By locked memory, are you referring to mlock() and friends? > > No ovsdb binary calls them, only the datapath cares about mlocking. > > > > > > - At a minimum, I understand the lcore var change introduced an > > increase in memory of 4kB * 128 (getpagesize() * RTE_MAX_LCORES), > > since lcore_var_alloc() calls memset() of the lcore var size, for > > every lcore. > > > > Yes, that is my understanding. It's also consistent with the > measurements I've posted on this list. > > > In this unit test where 1000 processes are kept alive in parallel, > > this means memory consumption increased by 512k * 1000, so ~500M at > > least. > > This amount of memory is probably significant in a resource-restrained > > env like a (Ubuntu) CI. > > > > > > I wouldn't expect thousands of concurrent processes in a > resource-constrained system. Sounds wasteful indeed. But sure, there may > well be scenarios where this make sense. > > > - I went and traced this unit tests on my laptop by monitoring > > kmem:mm_page_alloc, though there may be a better metrics when it comes > > to memory consumption. > > > > # dir=3Dbuild; perf stat -e kmem:mm_page_alloc -- tests/testsuite -C > > $dir/tests AUTOTEST_PATH=3D$dir/utilities:$dir/vswitchd:$dir/ovsdb:$dir= /vtep:$dir/tests:$dir/ipsec:: > > 2154 > > > > Which gives: > > - 1=E2=80=AF635=E2=80=AF489 kmem:mm_page_alloc for v23.11 > > - 5=E2=80=AF777=E2=80=AF043 kmem:mm_page_alloc for v24.11 > > > > Interesting. What is vm.overcommit_memory set to? # cat /proc/sys/vm/overcommit_memory 0 And I am not sure what is being used in Ubuntu CI. But the problem is, in the end, simpler. [snip] > > > There is a 4M difference, where I would expect 128k. > > So something more happens, than a simple page allocation per lcore, > > though I fail to understand what. Isolating the perf events for one process of this huge test, I counted 4878 page alloc calls. >From them, 4108 had rte_lcore_var_alloc in their calling stack which is unexpected. After spending some time reading glibc, I noticed alloc_perturb(). *bingo*, I remembered that OVS unit tests are run with MALLOC_PERTURB_ (=3D165 after double checking OVS sources). """ Tunable: glibc.malloc.perturb This tunable supersedes the MALLOC_PERTURB_ environment variable and is identical in features. If set to a non-zero value, memory blocks are initialized with values depending on some low order bits of this tunable when they are allocated (except when allocated by calloc) and freed. This can be used to debug the use of uninitialized or freed heap memory. Note that this option does not guarantee that the freed block will have any specific values. It only guarantees that the content the block had before it was freed will be overwritten. The default value of this tunable is =E2=80=980=E2=80=99. """ Now, reproducing this out of the test: $ perf stat -e kmem:mm_page_alloc -- ./build/ovsdb/ovsdb-client --help >/dev/null Performance counter stats for './build/ovsdb/ovsdb-client --help': 810 kmem:mm_page_alloc 0,003277941 seconds time elapsed 0,003260000 seconds user 0,000000000 seconds sys $ MALLOC_PERTURB_=3D165 perf stat -e kmem:mm_page_alloc -- ./build/ovsdb/ovsdb-client --help >/dev/null Performance counter stats for './build/ovsdb/ovsdb-client --help': 4=E2=80=AF789 kmem:mm_page_alloc 0,008766171 seconds time elapsed 0,000976000 seconds user 0,007794000 seconds sys So the issue is not triggered by mlock'd memory, but by the whole buffer of 16M for lcore variables being touched by a glibc debugging feature. And in Ubuntu CI, it translated to requesting 16G. > > > > > > Btw, just focusing on lcore var, I did two more tests: > > - 1=E2=80=AF606=E2=80=AF998 kmem:mm_page_alloc for v24.11 + revert= all lcore var changes. > > - 1=E2=80=AF634=E2=80=AF606 kmem:mm_page_alloc for v24.11 + curren= t series with > > postponed allocations. > > > > > > If one move initialization to shared object constructors (from having > been at some later time), and then end up not running that > initialization code at all (e.g., DPDK is not used), those code pages > will increase RSS. That might well hurt more than the lcore variable > memory itself, depending on how much code is run. > > However, such read-only pages can be replaced with something more useful > if the system is under memory pressure, so they aren't really a big > issue as far as (real) memory footprint is concerned. > > Just linking to DPDK (and its dependencies) already came with a 1-7 MB > RSS penalty, prior to lcore variables. I wonder how much of that goes > away if all RTE_INIT() type constructors are removed. Regardless of the RSS change, removing completely constructors is not simpl= e. Postponing *all* existing constructors from DPDK code would be an ABI breakage, as RTE_INIT have a priority notion and an application callbacks using RTE_INIT may rely on this. Just deferring "unprioritised" constructors would be doable on paper, but the location in rte_eal_init where those are is deferred would have to be carefully evaluated (with -d plugins in mind). --=20 David Marchand