From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <users-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 7ED05A0556
	for <public@inbox.dpdk.org>; Wed, 25 May 2022 15:33:47 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 4FCFF40146;
	Wed, 25 May 2022 15:33:47 +0200 (CEST)
Received: from mail-oa1-f48.google.com (mail-oa1-f48.google.com
 [209.85.160.48]) by mails.dpdk.org (Postfix) with ESMTP id 74F91400EF
 for <users@dpdk.org>; Wed, 25 May 2022 15:33:46 +0200 (CEST)
Received: by mail-oa1-f48.google.com with SMTP id
 586e51a60fabf-e656032735so26087309fac.0
 for <users@dpdk.org>; Wed, 25 May 2022 06:33:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=mime-version:references:in-reply-to:from:date:message-id:subject:to
 :cc; bh=KTAYdOyHhzQObMgmhLvOHu3MZriQ9Aep2I9sWmG4Le8=;
 b=YiseX+nALPn/GEVAj7GoJSIthKkuDSzDzHUX1aVB30mUa1LfeUcCYbQPHHaIzdiWVc
 L0JXmo3NGE097vuBghG/91hkk0TiJQRf4II1FxAdVoIXMo/2Gp47sfX8OhFKht4I97Om
 L8VWEWR103aKnX/TJEUu/NzjlCfPRBqzU+0qeELa74VC91uHfg32xRuaeDkUEr7p3TWK
 nxz1Dk1YJwruqsdkOajsJfdFRDM9RsJj04nOZJE/Mp7PP54P7HjoYvv5gMlnG8871yJQ
 Ar/mEslzG8u2QV5pe5nN5fThanHVCFR9fz0o5cqfxqDN04SdvM3zXdZ2lD6o6e16aRY0
 hFWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=KTAYdOyHhzQObMgmhLvOHu3MZriQ9Aep2I9sWmG4Le8=;
 b=piefRhCcvEJzBU57/2ton4i1+Pce9Vy0CyIjMAMs2nOca+3UvfgRnMSDvbhC4ztyBH
 GyriSYmO6xx7E4Wcm15VpPzGZBm9oudfD6jC7mQd/R5B2nuz6UzBfKtb9AYQAZmszeVD
 I6H3tL6eWwKxYqo95KFL370hDz/WM3ICU8uRjKwQ+v05UeIJdP4oKEZrkqR9rurw9HoE
 oSfKX+Cs8GbN+vo4cOKAIjPFRmEoX9N5Vmz2kyO+iVDrjJliX6UugtTgYxOT7VywLoKI
 VQHKblpTveiY+BUVdjNbdSWuoAD5Df/4eMuKxI7z8r8hqsevHHJhvhYe1m+8yaBZyHXr
 MGmA==
X-Gm-Message-State: AOAM531M5LlxWmabPk58IxCeLVPhNWJQUAP90dYJSM7RZPigwDH94wbC
 GVdjpzRWnd0UApVotowvYNUnX3XktHZBBu6NeYg=
X-Google-Smtp-Source: ABdhPJz3R58sUTjhlhJeqllogvz4Mc4WBkf5mj5vX0KCpfijHt8r4+D1FGg7pbRSKIB09dn9xd7khp2xc05mbkIq4H8=
X-Received: by 2002:a05:6870:b686:b0:dc:a9f4:90a2 with SMTP id
 cy6-20020a056870b68600b000dca9f490a2mr5656472oab.243.1653485625694; Wed, 25
 May 2022 06:33:45 -0700 (PDT)
MIME-Version: 1.0
References: <CAO8pfFkGq3pt44AH7mFDrez_7Kr7d+T8y3W7TNoTT0stbr+y_g@mail.gmail.com>
 <20220520084843.698f04ee@hermes.local>
 <CAO8pfFkZR8kWTHKV8RNptJUMWkWMaVEW98Y6HSYVx0sVYt2AfA@mail.gmail.com>
 <CAO8pfF=vXMZ0UptNWfiESm7pdqQgqwQZXn=wikofrEmFEAw01w@mail.gmail.com>
 <CAO8pfF=fbLCiWsWyfXv1Ns2pRX3=abB4t+_NeYsz+0ET6ASzig@mail.gmail.com>
 <PH0PR11MB47762B5DB43B098BE8420E8E90D69@PH0PR11MB4776.namprd11.prod.outlook.com>
In-Reply-To: <PH0PR11MB47762B5DB43B098BE8420E8E90D69@PH0PR11MB4776.namprd11.prod.outlook.com>
From: Antonio Di Bacco <a.dibacco.ks@gmail.com>
Date: Wed, 25 May 2022 15:33:34 +0200
Message-ID: <CAO8pfFndN7e=ZDOfLSDJfRe+LVv-B6uYYnHz=0EU-PEuhwRBGQ@mail.gmail.com>
Subject: Re: Optimizing memory access with DPDK allocated memory
To: "Kinsella, Ray" <ray.kinsella@intel.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>,
 "users@dpdk.org" <users@dpdk.org>
Content-Type: text/plain; charset="UTF-8"
X-BeenThere: users@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK usage discussions <users.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/users>,
 <mailto:users-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/users/>
List-Post: <mailto:users@dpdk.org>
List-Help: <mailto:users-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/users>,
 <mailto:users-request@dpdk.org?subject=subscribe>
Errors-To: users-bounces@dpdk.org

Wonderful tool, now it is completely clear, I do not have a bottleneck
on the DDR but on the core to DDR interface.

Single core results:
Command line parameters: mlc --max_bandwidth -H -k3
ALL Reads        :           9239.05
3:1 Reads-Writes :      13348.68
2:1 Reads-Writes :      14360.44
1:1 Reads-Writes :      13792.73


Two cores:
Command line parameters: mlc --max_bandwidth -H -k3-4
ALL Reads        :         24666.55
3:1 Reads-Writes :      30905.30
2:1 Reads-Writes :      32256.26
1:1 Reads-Writes :      37349.44


Eight cores:
Command line parameters: mlc --max_bandwidth -H -k3-10
ALL Reads        :         78109.94
3:1 Reads-Writes :      62105.06
2:1 Reads-Writes :      59628.81
1:1 Reads-Writes :      55320.34


On Wed, May 25, 2022 at 12:55 PM Kinsella, Ray <ray.kinsella@intel.com> wrote:
>
> Hi Antonio,
>
> If it is an Intel Platform you are using.
> You can take a look at the Intel Memory Latency Checker.
> https://www.intel.com/content/www/us/en/developer/articles/tool/intelr-memory-latency-checker.html
>
> (don't be fooled by the name, it does measure bandwidth).
>
> Ray K
>
> -----Original Message-----
> From: Antonio Di Bacco <a.dibacco.ks@gmail.com>
> Sent: Wednesday 25 May 2022 08:30
> To: Stephen Hemminger <stephen@networkplumber.org>
> Cc: users@dpdk.org
> Subject: Re: Optimizing memory access with DPDK allocated memory
>
> Just to add some more info that could possibly be useful to someone.
> Even if a processor has many memory channels; there is also another parameter to take into consideration, a given "core" cannot exploit all the memory bandwidth available.
> For example for a DDR4 2933 MT/s with 4 channels:
> the memory bandwidth is   2933 X 8 (# of bytes of width) X 4 (# of
> channels) = 93,866.88 MB/s bandwidth, or 94 GB/s but a single core (according to my tests with DPDK process writing a 1GB hugepage) is about 12 GB/s (with a block size exceeding the L3 cache size).
>
> Can anyone confirm that ?
>
> On Mon, May 23, 2022 at 3:16 PM Antonio Di Bacco <a.dibacco.ks@gmail.com> wrote:
> >
> > Got feedback from a guy working on HPC with DPDK and he told me that
> > with dpdk mem-test (don't know where to find it) I should be doing
> > 16GB/s with DDR4 (2666) per channel. In my case with 6 channels I
> > should be doing 90GB/s .... that would be amazing!
> >
> > On Sat, May 21, 2022 at 11:42 AM Antonio Di Bacco
> > <a.dibacco.ks@gmail.com> wrote:
> > >
> > > I read a couple of articles
> > > (https://www.thomas-krenn.com/en/wiki/Optimize_memory_performance_of
> > > _Intel_Xeon_Scalable_systems?xtxsearchselecthit=1
> > > and this
> > > https://www.exxactcorp.com/blog/HPC/balance-memory-guidelines-for-in
> > > tel-xeon-scalable-family-processors)
> > > and I understood a little bit more.
> > >
> > > If the XEON memory controller is able to spread contiguous memory
> > > accesses onto different channels in hardware (as Stepphen correctly
> > > stated), then, how DPDK with option -n can benefit an application?
> > > I also coded a test application to write a 1GB hugepage and
> > > calculate time needed but, equipping an additional two DIMM on two
> > > unused channels of my available six channels motherboard (X11DPi-NT)
> > > , I didn't observe any improvement. This is strange because adding
> > > two channels to the 4 already equipped should make a noticeable
> > > difference.
> > >
> > > For reference this is the small program for allocating and writing memory.
> > > https://github.com/adibacco/simple_mp_mem_2
> > > and the results with 4 memory channels:
> > > https://docs.google.com/spreadsheets/d/1mDoKYLMhMMKDaOS3RuGEnpPgRNKu
> > > ZOy4lMIhG-1N7B8/edit?usp=sharing
> > >
> > >
> > > On Fri, May 20, 2022 at 5:48 PM Stephen Hemminger
> > > <stephen@networkplumber.org> wrote:
> > > >
> > > > On Fri, 20 May 2022 10:34:46 +0200 Antonio Di Bacco
> > > > <a.dibacco.ks@gmail.com> wrote:
> > > >
> > > > > Let us say I have two memory channels each one with its own 16GB
> > > > > memory module, I suppose the first memory channel will be used
> > > > > when addressing physical memory in the range 0 to 0x4 0000 0000
> > > > > and the second when addressing physical memory in the range 0x4 0000 0000 to  0x7 ffff ffff.
> > > > > Correct?
> > > > > Now, I need to have a 2GB buffer with one "writer" and one
> > > > > "reader", the writer writes on half of the buffer (call it A)
> > > > > and, in the meantime, the reader reads on the other half (B).
> > > > > When the writer finishes writing its half buffer (A), signal it
> > > > > to the reader and they swap,  the reader starts to read from A and writer starts to write to B.
> > > > > If I allocate the whole buffer (on two 1GB hugepages) across the
> > > > > two memory channels, one half of the buffer is allocated on the
> > > > > end of first channel while the other half is allocated on the
> > > > > start of the second memory channel, would this increase
> > > > > performances compared to the whole buffer allocated within the same memory channel?
> > > >
> > > > Most systems just interleave memory chips based on number of filled slots.
> > > > This is handled by BIOS before kernel even starts.
> > > > The DPDK has a number of memory channels parameter and what it
> > > > does is try and optimize memory allocation by spreading.
> > > >
> > > > Looks like you are inventing your own limited version of what memif does.