From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D5EF5A0C50 for ; Sat, 24 Jul 2021 16:54:47 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5B1D340DDE; Sat, 24 Jul 2021 16:54:47 +0200 (CEST) Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by mails.dpdk.org (Postfix) with ESMTP id 2C36840DDA for ; Sat, 24 Jul 2021 16:54:45 +0200 (CEST) Received: by mail-wm1-f45.google.com with SMTP id k4so2492715wms.3 for ; Sat, 24 Jul 2021 07:54:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=jhkgSQVa9cPa54uCh5oUAderZFXDs1BKrnc2zD/LRko=; b=LS9sF9sXnQ4xYRK5yQAgsQMaxstOJaOJOr5C0TlULBMyKqIxzdB/K6ZxS2OwFQXhWD 61saO8TDY3aqmmZacxIlAIdxL0q2wsaMAlztFmyuDAV1cBNl02ZT0DBbJv6fp1+seQm9 SrSJe/XmfvzhIOVdkoCyL6iW1l07k+NeO44ZAUEJadlsX2NKEv4eBjAh3p0TAXRY/ODk vSfQBI4zvmamfpQM3p3o3Avm5kuXhhNNIaNQqsKA9UDcbNpszGCdvJsTELYN2Zc0dybY Fsxvofm8EZj6zr9ogh3galbWEIvBMqM/m3+NobBSH/lEEqA1MAKfF0hUVWZ3Zw1ZG/eT QPAg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=jhkgSQVa9cPa54uCh5oUAderZFXDs1BKrnc2zD/LRko=; b=Af217bl/uzDkb7XcsHhmc/RURh3iVyfODZSnAB3r4gDd6Y8DCCkUvhmaS2wCbyXNHJ T+MQRuGGe4L7cHFJLNN9VpuL4rGABehK7zJG7j/wMb40HOJyw6YHdPJ7Nuwp/Ek48YD0 5LJJFiuyZD6i2cQxbM2UMvjx6rGccV/ROGu6qeVw7Kf3yk3gMk0i1M2ha479sX1adhOS /3u6eS7EfOHlRWd0MEsM00HjLRakfBdQJV6x+Pm7T3r1sYgnD4vaqLkop42Y+bqxsJyW vWDc0ibXeySNFbDl7oj1U+1ekZ3xJoaqKR6qTdJTD1XnzbibeXMbDkHHSZcRcAvP/600 nEkQ== X-Gm-Message-State: AOAM5318EZojrOy8+/gwtGjJkFAxfcz8NR+PUNluHRn8j7+biYU6Q3dH YDkYa0u1ZF9K5UyLnY4MgMXqhFOBMdOXioWVP9pgbqNwlPAQ7g== X-Google-Smtp-Source: ABdhPJxwgB2rAOvsHPggjf4muQgbpstOGv+U5fiRtl1rKPI6/D8qqqzgOQSFg1AQwvPLDFrHK8eRHm467bZU1xa8wP4= X-Received: by 2002:a05:600c:3581:: with SMTP id p1mr19228872wmq.150.1627138484553; Sat, 24 Jul 2021 07:54:44 -0700 (PDT) MIME-Version: 1.0 From: Pavel Vazharov Date: Sat, 24 Jul 2021 17:54:29 +0300 Message-ID: To: users Content-Type: text/plain; charset="UTF-8" X-Content-Filtered-By: Mailman/MimeDel 2.1.29 Subject: [dpdk-users] rte_malloc behavior X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Sender: "users" Hi, Short intro to the original cause of my problem. We have an application where we use the FreeBSD stack running on top of DPDK. It's based on F-stack but with lots of modifications at this point. There are situations where we send lots of TCP data in a short period of time. Lots of 16KB blocks. The FreeBSD networking stack internally splits these blocks into so-called jumbo clusters (4K size) before putting them into the TCP socket buffers. All allocations needed from the FreeBSD stack are redirected to call rte_malloc. I observed that during such TCP sends we get peak delays in the sendmsg API call and tracked these delays down to the rte_malloc calls. After that I did some tests with the following piece of C++ code trying to isolate the issue further. [[gnu::noinline]] static void test_allocations(int allocations, int size, int align) noexcept { using namespace std::chrono; std::vector mem(allocations); const auto beg = high_resolution_clock::now(); for (int i = 0; i < allocations; ++i) { mem[i] = rte_malloc(nullptr, size, align); X3ME_ENFORCE(mem[i]); } const auto end = high_resolution_clock::now(); fmt::print( "Allocations:{} Size:{} Align:{} Time_msecs:{} Avg_time_usecs:{}\n", allocations, size, align, duration_cast(end - beg).count(), duration_cast(end - beg).count() / allocations); for (void* m : mem) rte_free(m); } The results show big delays in the rte_malloc function if we ask for 1K or 4K. These delays are not present if the size is not an exact multiple of 2 like these values. Allocations:4096 Size:4096 Align:4096 Time_msecs:330 Avg_time_usecs:80 Allocations:16384 Size:4096 Align:4096 Time_msecs:8724 Avg_time_usecs:532 Allocations:32768 Size:4096 Align:4096 Time_msecs:38291 Avg_time_usecs:1168 Allocations:4096 Size:4112 Align:4096 Time_msecs:12 Avg_time_usecs:3 Allocations:16384 Size:4112 Align:4096 Time_msecs:45 Avg_time_usecs:2 Allocations:32768 Size:4112 Align:4096 Time_msecs:83 Avg_time_usecs:2 Allocations:4096 Size:1024 Align:1024 Time_msecs:244 Avg_time_usecs:59 Allocations:16384 Size:1024 Align:1024 Time_msecs:4428 Avg_time_usecs:270 Allocations:32768 Size:1024 Align:1024 Time_msecs:26901 Avg_time_usecs:820 Allocations:4096 Size:1040 Align:1024 Time_msecs:4 Avg_time_usecs:1 Allocations:16384 Size:1040 Align:1024 Time_msecs:16 Avg_time_usecs:1 Allocations:32768 Size:1040 Align:1024 Time_msecs:30 Avg_time_usecs:0 And just for a reference the speed of the allocations using the standard "aligned_alloc/free" API instead of rte_malloc/rte_free. Allocations:32768 Size:1024 Align:1024 Time_msecs:66 Avg_time_usecs:2 Allocations:32768 Size:4096 Align:4096 Time_msecs:118 Avg_time_usecs:3 As far as I know some allocators have inefficiencies working with particular allocation sizes but I haven't expected such a big difference. Am I missing something from the documentation explaining this behavior? Should I report it to the devs mailing list? Regards, Pavel.