From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id C54FE4557F; Wed, 4 Sep 2024 11:31:03 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B35594028A; Wed, 4 Sep 2024 11:31:03 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id 1D4424014F for ; Wed, 4 Sep 2024 11:31:02 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 4CACE94A for ; Wed, 4 Sep 2024 11:31:01 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 40589949; Wed, 4 Sep 2024 11:31:01 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 4.0.0 (2022-12-13) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,AWL, T_SCC_BODY_TEXT_LINE autolearn=disabled version=4.0.0 X-Spam-Score: -1.2 Received: from [192.168.1.86] (h-62-63-215-114.A163.priv.bahnhof.se [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id AA4F29CB; Wed, 4 Sep 2024 11:30:59 +0200 (CEST) Message-ID: Date: Wed, 4 Sep 2024 11:30:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC 0/2] introduce LLC aware functions To: "Varghese, Vipin" , ferruh.yigit@amd.com, dev@dpdk.org References: <20240827151014.201-1-vipin.varghese@amd.com> <45f26104-ad6c-4e42-8446-d8b51ac3f2dd@lysator.liu.se> Content-Language: en-US From: =?UTF-8?Q?Mattias_R=C3=B6nnblom?= In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On 2024-09-02 02:39, Varghese, Vipin wrote: > > > Thank you Mattias for the comments and question, please let me try to > explain the same below > >> We shouldn't have a separate CPU/cache hierarchy API instead? > > Based on the intention to bring in CPU lcores which share same L3 (for > better cache hits and less noisy neighbor) current API focuses on using > > Last Level Cache. But if the suggestion is `there are SoC where L2 cache > are also shared, and the new API should be provisioned`, I am also > > comfortable with the thought. > Rather than some AMD special case API hacked into , I think we are better off with no DPDK API at all for this kind of functionality. A DPDK CPU/memory hierarchy topology API very much makes sense, but it should be reasonably generic and complete from the start. >> >> Could potentially be built on the 'hwloc' library. > > There are 3 reason on AMD SoC we did not explore this path, reasons are > > 1. depending n hwloc version and kernel version certain SoC hierarchies > are not available > > 2. CPU NUMA and IO (memory & PCIe) NUMA are independent on AMD Epyc Soc. > > 3. adds the extra dependency layer of library layer to be made available > to work. > > > hence we have tried to use Linux Documented generic layer of `sysfs CPU > cache`. > > I will try to explore more on hwloc and check if other libraries within > DPDK leverages the same. > >> >> I much agree cache/core topology may be of interest of the application >> (or a work scheduler, like a DPDK event device), but it's not limited to >> LLC. It may well be worthwhile to care about which cores shares L2 >> cache, for example. Not sure the RTE_LCORE_FOREACH_* approach scales. > > yes, totally understand as some SoC, multiple lcores shares same L2 cache. > > > Can we rework the API to be rte_get_cache_ where user argument > is desired lcore index. > > 1. index-1: SMT threads > > 2. index-2: threads sharing same L2 cache > > 3. index-3: threads sharing same L3 cache > > 4. index-MAX: identify the threads sharing last level cache. > >> >>> < Function: Purpose > >>> --------------------- >>>   - rte_get_llc_first_lcores: Retrieves all the first lcores in the >>> shared LLC. >>>   - rte_get_llc_lcore: Retrieves all lcores that share the LLC. >>>   - rte_get_llc_n_lcore: Retrieves the first n or skips the first n >>> lcores in the shared LLC. >>> >>> < MACRO: Purpose > >>> ------------------ >>> RTE_LCORE_FOREACH_LLC_FIRST: iterates through all first lcore from >>> each LLC. >>> RTE_LCORE_FOREACH_LLC_FIRST_WORKER: iterates through all first worker >>> lcore from each LLC. >>> RTE_LCORE_FOREACH_LLC_WORKER: iterates lcores from LLC based on hint >>> (lcore id). >>> RTE_LCORE_FOREACH_LLC_SKIP_FIRST_WORKER: iterates lcores from LLC >>> while skipping first worker. >>> RTE_LCORE_FOREACH_LLC_FIRST_N_WORKER: iterates through `n` lcores >>> from each LLC. >>> RTE_LCORE_FOREACH_LLC_SKIP_N_WORKER: skip first `n` lcores, then >>> iterates through reaming lcores in each LLC. >>> > While the MACRO are simple wrapper invoking appropriate API. can this be > worked out in this fashion? > >