Re: [RFC 0/2] introduce LLC aware functions

DPDK patches and discussions
 help / color / mirror / Atom feed

From: "Mattias Rönnblom" <hofors@lysator.liu.se>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: "Varghese, Vipin" <Vipin.Varghese@amd.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"Yigit, Ferruh" <Ferruh.Yigit@amd.com>,
	"dev@dpdk.org" <dev@dpdk.org>, nd <nd@arm.com>
Subject: Re: [RFC 0/2] introduce LLC aware functions
Date: Thu, 12 Sep 2024 18:32:38 +0200	[thread overview]
Message-ID: <8a882827-874b-4e2d-ae89-d0b243ff7e77@lysator.liu.se> (raw)
In-Reply-To: <ZuLtBLIkyUQJ01S4@bricha3-mobl1.ger.corp.intel.com>

On 2024-09-12 15:30, Bruce Richardson wrote:
> On Thu, Sep 12, 2024 at 01:59:34PM +0200, Mattias Rönnblom wrote:
>> On 2024-09-12 13:17, Varghese, Vipin wrote:
>>> [AMD Official Use Only - AMD Internal Distribution Only]
>>>
>>> <snipped>
>>>>>>>> Thank you Mattias for the information, as shared by in the reply
>>>>>>>> with
>>>>>> Anatoly we want expose a new API `rte_get_next_lcore_ex` which
>>>>>> intakes a extra argument `u32 flags`.
>>>>>>>> The flags can be RTE_GET_LCORE_L1 (SMT), RTE_GET_LCORE_L2,
>>>>>> RTE_GET_LCORE_L3, RTE_GET_LCORE_BOOST_ENABLED,
>>>>>> RTE_GET_LCORE_BOOST_DISABLED.
>>>>>>>
>>>>>>> Wouldn't using that API be pretty awkward to use?
>>>>> Current API available under DPDK is ` rte_get_next_lcore`, which is used
>>>> within DPDK example and in customer solution.
>>>>> Based on the comments from others we responded to the idea of changing
>>>> the new Api from ` rte_get_next_lcore_llc` to ` rte_get_next_lcore_exntd`.
>>>>>
>>>>> Can you please help us understand what is `awkward`.
>>>>>
>>>>
>>>> The awkwardness starts when you are trying to fit provide hwloc type
>>>> information over an API that was designed for iterating over lcores.
>>> I disagree to this point, current implementation of lcore libraries is
>>> only focused on iterating through list of enabled cores, core-mask, and
>>> lcore-map.
>>> With ever increasing core count, memory, io and accelerators on SoC,
>>> sub-numa partitioning is common in various vendor SoC. Enhancing or
>>> Augumenting lcore API to extract or provision NUMA, Cache Topology is
>>> not awkward.
>>
>> DPDK providing an API for this information makes sense to me, as I've
>> mentioned before. What I questioned was the way it was done (i.e., the API
>> design) in your RFC, and the limited scope (which in part you have
>> addressed).
>>
> 
> Actually, I'd like to touch on this first item a little bit. What is the
> main benefit of providing this information in EAL? To me, it seems like
> something that is for apps to try and be super-smart and select particular
> cores out of a set of cores to run on. However, is that not taking work
> that should really be the job of the person deploying the app? The deployer
> - if I can use that term - has already selected a set of cores and NICs for
> a DPDK application to use. Should they not also be the one selecting - via
> app argument, via --lcores flag to map one core id to another, or otherwise
> - which part of an application should run on what particular piece of
> hardware?
> 

Scheduling in one form or another will happen on a number of levels. One 
level is what you call the "deployer". Whether man or machine, it will 
allocate a bunch of lcores to the application - either statically by 
using -l <cores>, or dynamically, by giving a very large core mask, 
combined with having an agent in the app responsible to scale up or down 
the number of cores actually used (allowing coexistence with other 
non-DPDK, Linux process scheduler-scheduled processes, on the same set 
of cores, although not at the same time).

I think the "deployer" level should generally not be aware of the DPDK 
app internals, including how to assign different tasks to different 
cores. That is consistent with how things work in a general-purpose 
operating system, where you allocate cores, memory and I/O devices to an 
instance (e.g., a VM), but then OS' scheduler figures out how to best 
use them.

The app internal may be complicated, change across software versions and 
traffic mixes/patterns, and most of all, not lend itself to static 
at-start configuration at all.

> In summary, what is the final real-world intended usecase for this work?

One real-world example is an Eventdev app with some atomic processing 
stage, using DSW, and SMT. Hardware threading on Intel x86 generally 
improves performance with ~25%, which seems to hold true for data plane 
apps as well, in my experience. So that's a (not-so-)freebie you don't 
want to miss out on. To max out single-flow performance, the work 
scheduler may not only need to give 100% of an lcore to bottleneck stage 
atomic processing for that elephant flow, but a *full* physical core 
(i.e., assure that the SMT sibling is idle). But, DSW doesn't understand 
the CPU topology, so you have to choose between max multi-flow 
throughput or max single-flow throughput at the time of deployment. A 
RTE hwtopo API would certainly help in the implementation of SMT-aware 
scheduling.

Another example could be the use of bigger or turbo-capable cores to run 
CPU-hungry, singleton services (e.g., a Eventdev RX timer adapter core), 
or the use of a hardware thread to run the SW scheduler service (which 
needs to react quickly to incoming scheduling events, but maybe not need 
all the cycles of a full physical core).

Yet another example would be an event device which understand how to 
spread a particular flow across multiple cores, but use only cores 
sharing the same L2. Or, keep only processing of a certain kind (e.g., a 
certain Eventdev Queue) on cores with the same L2, improve L2 hit rates 
for instructions and data related to that processing stage.

> DPDK already tries to be smart about cores and NUMA, and in some cases we
> have hit issues where users have - for their own valid reasons - wanted to
> run DPDK in a sub-optimal way, and they end up having to fight DPDK's
> smarts in order to do so! Ref: [1]
> 
> /Bruce
> 
> [1] https://git.dpdk.org/dpdk/commit/?id=ed34d87d9cfbae8b908159f60df2008e45e4c39f

next prev parent reply	other threads:[~2024-09-12 16:32 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-27 15:10 Vipin Varghese
2024-08-27 15:10 ` [RFC 1/2] eal: add llc " Vipin Varghese
2024-08-27 17:36   ` Stephen Hemminger
2024-09-02  0:27     ` Varghese, Vipin
2024-08-27 20:56   ` Wathsala Wathawana Vithanage
2024-08-29  3:21     ` 答复: " Feifei Wang
2024-09-02  1:20     ` Varghese, Vipin
2024-09-03 17:54       ` Wathsala Wathawana Vithanage
2024-09-04  8:18         ` Bruce Richardson
2024-09-06 11:59         ` Varghese, Vipin
2024-09-12 16:58           ` Wathsala Wathawana Vithanage
2024-10-21  8:20             ` Varghese, Vipin
2024-08-27 15:10 ` [RFC 2/2] eal/lcore: add llc aware for each macro Vipin Varghese
2024-08-27 21:23 ` [RFC 0/2] introduce LLC aware functions Mattias Rönnblom
2024-09-02  0:39   ` Varghese, Vipin
2024-09-04  9:30     ` Mattias Rönnblom
2024-09-04 14:37       ` Stephen Hemminger
2024-09-11  3:13         ` Varghese, Vipin
2024-09-11  3:53           ` Stephen Hemminger
2024-09-12  1:11             ` Varghese, Vipin
2024-09-09 14:22       ` Varghese, Vipin
2024-09-09 14:52         ` Mattias Rönnblom
2024-09-11  3:26           ` Varghese, Vipin
2024-09-11 15:55             ` Mattias Rönnblom
2024-09-11 17:04               ` Honnappa Nagarahalli
2024-09-12  1:33                 ` Varghese, Vipin
2024-09-12  6:38                   ` Mattias Rönnblom
2024-09-12  7:02                     ` Mattias Rönnblom
2024-09-12 11:23                       ` Varghese, Vipin
2024-09-12 12:12                         ` Mattias Rönnblom
2024-09-12 15:50                           ` Stephen Hemminger
2024-09-12 11:17                     ` Varghese, Vipin
2024-09-12 11:59                       ` Mattias Rönnblom
2024-09-12 13:30                         ` Bruce Richardson
2024-09-12 16:32                           ` Mattias Rönnblom [this message]
2024-09-12  2:28                 ` Varghese, Vipin
2024-09-11 16:01             ` Bruce Richardson
2024-09-11 22:25               ` Konstantin Ananyev
2024-09-12  2:38                 ` Varghese, Vipin
2024-09-12  2:19               ` Varghese, Vipin
2024-09-12  9:17                 ` Bruce Richardson
2024-09-12 11:50                   ` Varghese, Vipin
2024-09-13 14:15                     ` Burakov, Anatoly
2024-09-12 13:18                   ` Mattias Rönnblom
2024-08-28  8:38 ` Burakov, Anatoly
2024-09-02  1:08   ` Varghese, Vipin
2024-09-02 14:17     ` Burakov, Anatoly
2024-09-02 15:33       ` Varghese, Vipin
2024-09-03  8:50         ` Burakov, Anatoly
2024-09-05 13:05           ` Ferruh Yigit
2024-09-05 14:45             ` Burakov, Anatoly
2024-09-05 15:34               ` Ferruh Yigit
2024-09-06  8:44                 ` Burakov, Anatoly
2024-09-09 14:14                   ` Varghese, Vipin
2024-10-07 21:28 ` Stephen Hemminger
2024-10-21  8:17   ` Varghese, Vipin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8a882827-874b-4e2d-ae89-d0b243ff7e77@lysator.liu.se \
    --to=hofors@lysator.liu.se \
    --cc=Ferruh.Yigit@amd.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Vipin.Varghese@amd.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).