From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 8D565A054F for ; Wed, 25 May 2022 09:30:31 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5E151400EF; Wed, 25 May 2022 09:30:31 +0200 (CEST) Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com [209.85.167.178]) by mails.dpdk.org (Postfix) with ESMTP id 39E6A400D6 for ; Wed, 25 May 2022 09:30:30 +0200 (CEST) Received: by mail-oi1-f178.google.com with SMTP id r68so9939229oie.12 for ; Wed, 25 May 2022 00:30:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2JUpQ3qKhKmxoqRgkE5GvUzTpqmvveTZdRW007nwjEU=; b=UOhsjdo3YqTtsCV7tFYGjwHnuHcG9q0RiJQcH6TeQBg7pZnVMk1BCVvv9fu54C0e83 Emi4xIIii+diOoy1YuvBot314EJgz0uDxeLyUeWpBRj6FWyASRBjwmMNzVoeYfkJOpHE xg4J+4XvgXOxbuX2371xfqazxAJZg6hzxx2vqb7lBRxMnOtmoEd7tmu7YdYNVDmLMmNT mYQoEcPMWOi1QiDbTDvVjcSdReXi6+HG4zhy828aiEZ6hr9cpq9qhMFTalQRjY2Xgqtk 2wG/W1Iuv8QN73+AAda049AWE7PxxU/SFu5ft82g/P7K3yPlrdIg99BiGKtyOLYfs1u6 eAWQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2JUpQ3qKhKmxoqRgkE5GvUzTpqmvveTZdRW007nwjEU=; b=awkTDEMqODlK40+FWZiySPBn8BCrK+WxgIOEoemf3UtlW8DE0l7hsKoPc9NX6SN8f8 PDHIlhLB5MQmZlOFxyXEnEpjYR6+1FtB1SekBUpj+55vN9T+wO5y82yIrwgo091CUxS6 hEQSaTYChvgyostOG2C5sKAgaMWNxcF2+tqId5oh7AgVl31vE7bphp+J2PZvwGOOqHTU Bj/xoxRCFsbGtuA1KZwlNs1egk+DUf670DfUdZ7T/nPL0sXMSrwtfO4Xe2L8UKU1Nzsf HdfspQBE5JuUikwxddbB4XtkFX41XRqlZCLWWO4GP+vya5owPHw7k12d/bDGyhtUbF6v jLSg== X-Gm-Message-State: AOAM532Zx3c8Yrqy+yZ1R+Pxchp2nrC4VYaeyg+nQm3vky64uQDsaBY2 +oTVwWrb5sAtL64qkpWbt5EUWzKwPGz8A+emcJRDpqOO1m0ELw== X-Google-Smtp-Source: ABdhPJyMJ2zLkYO54jTY3ygXBeCAV0px5s4IwrlK8SngKnXkfSBcDyrEsIkQiseaqNuSQkUEoP+StQRvn2w0d/6CxuQ= X-Received: by 2002:a05:6808:f90:b0:326:e17a:2a68 with SMTP id o16-20020a0568080f9000b00326e17a2a68mr4353015oiw.59.1653463829472; Wed, 25 May 2022 00:30:29 -0700 (PDT) MIME-Version: 1.0 References: <20220520084843.698f04ee@hermes.local> In-Reply-To: From: Antonio Di Bacco Date: Wed, 25 May 2022 09:30:18 +0200 Message-ID: Subject: Re: Optimizing memory access with DPDK allocated memory To: Stephen Hemminger Cc: users@dpdk.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: users@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK usage discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: users-bounces@dpdk.org Just to add some more info that could possibly be useful to someone. Even if a processor has many memory channels; there is also another parameter to take into consideration, a given "core" cannot exploit all the memory bandwidth available. For example for a DDR4 2933 MT/s with 4 channels: the memory bandwidth is 2933 X 8 (# of bytes of width) X 4 (# of channels) = 93,866.88 MB/s bandwidth, or 94 GB/s but a single core (according to my tests with DPDK process writing a 1GB hugepage) is about 12 GB/s (with a block size exceeding the L3 cache size). Can anyone confirm that ? On Mon, May 23, 2022 at 3:16 PM Antonio Di Bacco wrote: > > Got feedback from a guy working on HPC with DPDK and he told me that > with dpdk mem-test (don't know where to find it) I should be doing > 16GB/s with DDR4 (2666) per channel. In my case with 6 channels I > should be doing 90GB/s .... that would be amazing! > > On Sat, May 21, 2022 at 11:42 AM Antonio Di Bacco > wrote: > > > > I read a couple of articles > > (https://www.thomas-krenn.com/en/wiki/Optimize_memory_performance_of_Intel_Xeon_Scalable_systems?xtxsearchselecthit=1 > > and this https://www.exxactcorp.com/blog/HPC/balance-memory-guidelines-for-intel-xeon-scalable-family-processors) > > and I understood a little bit more. > > > > If the XEON memory controller is able to spread contiguous memory > > accesses onto different channels in hardware (as Stepphen correctly > > stated), then, how DPDK with option -n can benefit an application? > > I also coded a test application to write a 1GB hugepage and calculate > > time needed but, equipping an additional two DIMM on two unused > > channels of my available six channels motherboard (X11DPi-NT) , I > > didn't observe any improvement. This is strange because adding two > > channels to the 4 already equipped should make a noticeable > > difference. > > > > For reference this is the small program for allocating and writing memory. > > https://github.com/adibacco/simple_mp_mem_2 > > and the results with 4 memory channels: > > https://docs.google.com/spreadsheets/d/1mDoKYLMhMMKDaOS3RuGEnpPgRNKuZOy4lMIhG-1N7B8/edit?usp=sharing > > > > > > On Fri, May 20, 2022 at 5:48 PM Stephen Hemminger > > wrote: > > > > > > On Fri, 20 May 2022 10:34:46 +0200 > > > Antonio Di Bacco wrote: > > > > > > > Let us say I have two memory channels each one with its own 16GB memory > > > > module, I suppose the first memory channel will be used when addressing > > > > physical memory in the range 0 to 0x4 0000 0000 and the second when > > > > addressing physical memory in the range 0x4 0000 0000 to 0x7 ffff ffff. > > > > Correct? > > > > Now, I need to have a 2GB buffer with one "writer" and one "reader", the > > > > writer writes on half of the buffer (call it A) and, in the meantime, the > > > > reader reads on the other half (B). When the writer finishes writing its > > > > half buffer (A), signal it to the reader and they swap, the reader starts > > > > to read from A and writer starts to write to B. > > > > If I allocate the whole buffer (on two 1GB hugepages) across the two memory > > > > channels, one half of the buffer is allocated on the end of first channel > > > > while the other half is allocated on the start of the second memory > > > > channel, would this increase performances compared to the whole buffer > > > > allocated within the same memory channel? > > > > > > Most systems just interleave memory chips based on number of filled slots. > > > This is handled by BIOS before kernel even starts. > > > The DPDK has a number of memory channels parameter and what it does > > > is try and optimize memory allocation by spreading. > > > > > > Looks like you are inventing your own limited version of what memif does.