From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E9D1F45964; Thu, 12 Sep 2024 00:25:44 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8AEE540281; Thu, 12 Sep 2024 00:25:44 +0200 (CEST) Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by mails.dpdk.org (Postfix) with ESMTP id 997C64027D for ; Thu, 12 Sep 2024 00:25:43 +0200 (CEST) Received: from mail.maildlp.com (unknown [172.18.186.216]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4X3w5y3pzDz6K9R4; Thu, 12 Sep 2024 06:21:42 +0800 (CST) Received: from frapeml100006.china.huawei.com (unknown [7.182.85.201]) by mail.maildlp.com (Postfix) with ESMTPS id 05018140A78; Thu, 12 Sep 2024 06:25:42 +0800 (CST) Received: from frapeml500007.china.huawei.com (7.182.85.172) by frapeml100006.china.huawei.com (7.182.85.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 12 Sep 2024 00:25:41 +0200 Received: from frapeml500007.china.huawei.com ([7.182.85.172]) by frapeml500007.china.huawei.com ([7.182.85.172]) with mapi id 15.01.2507.039; Thu, 12 Sep 2024 00:25:41 +0200 From: Konstantin Ananyev To: Bruce Richardson , "Varghese, Vipin" CC: =?iso-8859-1?Q?Mattias_R=F6nnblom?= , "Yigit, Ferruh" , "dev@dpdk.org" Subject: RE: [RFC 0/2] introduce LLC aware functions Thread-Topic: [RFC 0/2] introduce LLC aware functions Thread-Index: AQHa+JNduM5O6tu+B0eAgcgAyfpoQLI7e+wAgAgSgICAA7kzgIAILQeAgAAIlQCAAmTUAIAA0xgAgACMB5A= Date: Wed, 11 Sep 2024 22:25:41 +0000 Message-ID: <5b0a3e6e91a3404d90224420e1cd758d@huawei.com> References: <20240827151014.201-1-vipin.varghese@amd.com> <45f26104-ad6c-4e42-8446-d8b51ac3f2dd@lysator.liu.se> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.48.158.8] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > > > >>> Thank you Mattias for the comments and question, please let me tr= y > > > >>> to explain the same below > > > >>> > > > >>>> We shouldn't have a separate CPU/cache hierarchy API instead? > > > >>> > > > >>> Based on the intention to bring in CPU lcores which share same L3 > > > >>> (for better cache hits and less noisy neighbor) current API focus= es > > > >>> on using > > > >>> > > > >>> Last Level Cache. But if the suggestion is `there are SoC where L= 2 > > > >>> cache are also shared, and the new API should be provisioned`, I = am > > > >>> also > > > >>> > > > >>> comfortable with the thought. > > > >>> > > > >> > > > >> Rather than some AMD special case API hacked into , I > > > >> think we are better off with no DPDK API at all for this kind of f= unctionality. > > > > > > > > Hi Mattias, as shared in the earlier email thread, this is not a AM= D special > > > case at all. Let me try to explain this one more time. One of techniq= ues used to > > > increase cores cost effective way to go for tiles of compute complexe= s. > > > > This introduces a bunch of cores in sharing same Last Level Cache (= namely > > > L2, L3 or even L4) depending upon cache topology architecture. > > > > > > > > The API suggested in RFC is to help end users to selectively use co= res under > > > same Last Level Cache Hierarchy as advertised by OS (irrespective of = the BIOS > > > settings used). This is useful in both bare-metal and container envir= onment. > > > > > > > > > > I'm pretty familiar with AMD CPUs and the use of tiles (including the > > > challenges these kinds of non-uniformities pose for work scheduling). > > > > > > To maximize performance, caring about core<->LLC relationship may wel= l not > > > be enough, and more HT/core/cache/memory topology information is > > > required. That's what I meant by special case. A proper API should al= low > > > access to information about which lcores are SMT siblings, cores on t= he same > > > L2, and cores on the same L3, to name a few things. Probably you want= to fit > > > NUMA into the same API as well, although that is available already in > > > . > > > > Thank you Mattias for the information, as shared by in the reply with A= natoly we want expose a new API `rte_get_next_lcore_ex` > which intakes a extra argument `u32 flags`. > > The flags can be RTE_GET_LCORE_L1 (SMT), RTE_GET_LCORE_L2, RTE_GET_LCOR= E_L3, RTE_GET_LCORE_BOOST_ENABLED, > RTE_GET_LCORE_BOOST_DISABLED. > > >=20 > For the naming, would "rte_get_next_sibling_core" (or lcore if you prefer= ) > be a clearer name than just adding "ex" on to the end of the existing > function? >=20 > Looking logically, I'm not sure about the BOOST_ENABLED and BOOST_DISABLE= D > flags you propose - in a system with multiple possible standard and boost > frequencies what would those correspond to? What's also missing is a defi= ne > for getting actual NUMA siblings i.e. those sharing common memory but not > an L3 or anything else. >=20 > My suggestion would be to have the function take just an integer-type e.g= . > uint16_t parameter which defines the memory/cache hierarchy level to use,= 0 > being lowest, 1 next, and so on. Different systems may have different > numbers of cache levels so lets just make it a zero-based index of levels= , > rather than giving explicit defines (except for memory which should > probably always be last). The zero-level will be for "closest neighbour" > whatever that happens to be, with as many levels as is necessary to expre= ss > the topology, e.g. without SMT, but with 3 cache levels, level 0 would be > an L2 neighbour, level 1 an L3 neighbour. If the L3 was split within a > memory NUMA node, then level 2 would give the NUMA siblings. We'd just ne= ed > an API to return the max number of levels along with the iterator. Sounds like a neat idea to me.