From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout1.w1.samsung.com (mailout1.w1.samsung.com [210.118.77.11]) by dpdk.org (Postfix) with ESMTP id EBB2F5B32 for ; Tue, 27 Nov 2018 17:50:02 +0100 (CET) Received: from eucas1p2.samsung.com (unknown [182.198.249.207]) by mailout1.w1.samsung.com (KnoxPortal) with ESMTP id 20181127165001euoutp0175aab334f762860496005f85e288eade~rCTCFW6Md2624626246euoutp01D for ; Tue, 27 Nov 2018 16:50:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout1.w1.samsung.com 20181127165001euoutp0175aab334f762860496005f85e288eade~rCTCFW6Md2624626246euoutp01D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1543337401; bh=Vpibwtls+MuhPv4HrZhx453a3ikThUqmNcE9zaQdup8=; h=Subject:To:Cc:From:Date:In-Reply-To:References:From; b=V4Gqjd/6C+NKSgHbGoqnsr/ZjRAiCGsCsXMOmaG6pKQkl7qpd8P3aiE91BibNn3Oh 8hK1R50ABc4VQ7EbXpKaVWr0bPasTnX6z1RTKiKNVSOmDSb69zac+LXs4f4jCe249k ngYzxPQDBWNS6KdB2U0PpP+YpLBeItDxwH6nYyPQ= Received: from eusmges3new.samsung.com (unknown [203.254.199.245]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20181127165000eucas1p1ab6fcd541bf18a8de49fe837581420b5~rCTBkrXOX1932719327eucas1p1g; Tue, 27 Nov 2018 16:50:00 +0000 (GMT) Received: from eucas1p2.samsung.com ( [182.198.249.207]) by eusmges3new.samsung.com (EUCPMTA) with SMTP id 39.0C.04806.8B57DFB5; Tue, 27 Nov 2018 16:50:00 +0000 (GMT) Received: from eusmtrp1.samsung.com (unknown [182.198.249.138]) by eucas1p1.samsung.com (KnoxPortal) with ESMTPA id 20181127164959eucas1p1cd891aee2a152f4a5212a35136a1fa01~rCTA1y7eE1933119331eucas1p1U; Tue, 27 Nov 2018 16:49:59 +0000 (GMT) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eusmtrp1.samsung.com (KnoxPortal) with ESMTP id 20181127164959eusmtrp12726e230978ebaee12f8f05651535647~rCTAnLAtu1237512375eusmtrp1S; Tue, 27 Nov 2018 16:49:59 +0000 (GMT) X-AuditID: cbfec7f5-367ff700000012c6-45-5bfd75b8fabb Received: from eusmtip2.samsung.com ( [203.254.199.222]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id E3.53.04284.7B57DFB5; Tue, 27 Nov 2018 16:49:59 +0000 (GMT) Received: from [106.109.129.180] (unknown [106.109.129.180]) by eusmtip2.samsung.com (KnoxPortal) with ESMTPA id 20181127164959eusmtip2d043320f8953e6d6a54ac65422e9df90~rCTAGB2ex0560805608eusmtip2N; Tue, 27 Nov 2018 16:49:59 +0000 (GMT) To: "Burakov, Anatoly" , Hemant Agrawal , Asaf Sinai , "dev@dpdk.org" , Thomas Monjalon Cc: Ilia Ferdman , Sasha Hodos From: Ilya Maximets Message-ID: <71f711a9-c493-8df0-1888-fef5f40c80bd@samsung.com> Date: Tue, 27 Nov 2018 19:49:58 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <3e685127-a32a-3c4c-63d5-5ccdc0181fbc@intel.com> Content-Language: en-GB Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA02Sb0hTYRTGe3fvtrvR5Dq1HcwwR0JFarGCUWYFVou+GIRFprnyMsVNZdf/ fdBlpomoZbScJYqaU8k/NecKCZsxK7N9KDArdJgEKq50NtFNadtV8tvvec553nMOvAQmNLKD ibSMbEqdIVeKOXzcaFmxRphy1hIOuhZDpFMTzZhUM9iGS+2L/Sxp+89ejvRL2QpXWlVj40hb P+6QLtre4ScJ2WpjK1vWPDDDklnGH3JlvXYTS/Z9wcqWVRk6UBznCj86hVKm5VLqqJhkfqpb 34BljcnyH1QlFqMX0RWIRwB5GJzfXFgF4hNCUo9AM3yPy4glBKaWKRYjHAhGlrVoM1I214yY QhuClxOjG/kFBPO6Go5XBJC1CHpa/vhEIGlB0Godwrx5jDwDd/VdbC9zyAPwofOt710BGQPt 07UcL+NkODj6enEvB5GXoXyyk8P0+MP7ummfzyOPw9PSWZx5UwS3ltrZDIdCSV+9byUgzVxw WrVsZvFYeDKpZTEcALPDBi7DITBSW4kzXASTt2cQEy5HoDWvbwROgGHukydAeCbsg+5XUYx9 CkrLTWyvDaQffJ33Z3bwg/tGLcbYAii/I2S694DrTRvGcDCM2x3cGiTWbblMt+Ua3ZZrdP/n NiK8A4moHFqloGhJBpUXSctVdE6GIvJGpuo58vymkfXhvyb02n3djEgCibcL4qi1BCFbnksX qMwICEwcKDhKeyxBirygkFJnXlPnKCnajHYSuFgkuLnNliAkFfJsKp2isij1ZpVF8IKLUeUv c1VpVGo6OmdD5vj4pMQeZ67r7Gq3M2/gwtqz7hRLY5Pk/HxTRZ2opaG6L7NwCbMvKB4n1Rbr +0d3S0zJjkL+mItXH1bkOtY1VHFEo1upj3VOx2vjI0o0nw17f4RCeqD7YuXMLuOcyl2qPF0d FLb86Gq+5Lcm3DJ6aVAtxulU+aH9mJqW/wNSgRVpSQMAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFjrJIsWRmVeSWpSXmKPExsVy+t/xe7rbS/9GGzy8pmHx6N5iZovGA8tZ LN592s5ksfLxRjaLK+0/2S36Jjxgs1h6Rszi04MTLA4cHr8WLGX1WLznJZPHsZvT2D02vtvB 5HH743lWj74tqxgD2KL0bIryS0tSFTLyi0tslaINLYz0DC0t9IxMLPUMjc1jrYxMlfTtbFJS czLLUov07RL0Mv6smMdccN2jYkpfbAPjZpsuRk4OCQETifbXixm7GLk4hASWMkosujidBSIh JfHj1wVWCFtY4s+1LjaIoveMEiv/NzODOMICkxklnm+fCpYRETjGKPFg2TJGkBZmATeJzhXr WCFazrNLvFu9EWwWm4COxKnVR8CKeAXsJFY+mcwGYrMIqEp83roRbLeoQITE2ZfroGoEJU7O fAIW5xSwlVjW+ooFYoG6xJ95l5ghbHGJpi8rWSFseYnmrbOZJzAKzULSPgtJyywkLbOQtCxg ZFnFKJJaWpybnltsqFecmFtcmpeul5yfu4kRGI3bjv3cvIPx0sbgQ4wCHIxKPLwBqX+jhVgT y4orcw8xSnAwK4nwWhUDhXhTEiurUovy44tKc1KLDzGaAj03kVlKNDkfmCjySuINTQ3NLSwN zY3Njc0slMR5zxtURgkJpCeWpGanphakFsH0MXFwSjUwMlgXJsYzJzyJvT5N5Ojbec/93sz4 4fz2zMVovWkPd9YsDn7s4nLcstNzzZm9EjsnhrPU+b97lub5/3ZBbGZVdU3GXBurBU94rzga rru0RYrlCO+dPmv2JV4Hz72fOztg0ZU5U2vPfNtlqtnLcHvb8QcfbzAkHnQo91pfv/K0jM8i iUirBw8eqyqxFGckGmoxFxUnAgCPQGV83AIAAA== X-CMS-MailID: 20181127164959eucas1p1cd891aee2a152f4a5212a35136a1fa01 X-Msg-Generator: CA Content-Type: text/plain; charset="utf-8" X-RootMTR: 20181126125108epcas1p2c649d8fa87fb739c2f13435e5ba80d88 X-EPHeader: CA CMS-TYPE: 201P X-CMS-RootMailID: 20181126125108epcas1p2c649d8fa87fb739c2f13435e5ba80d88 References: <2b09cec8-0883-2ed2-0264-aeef871ea6a9@intel.com> <518f9333-8d80-0fa2-d391-b4c8df181508@intel.com> <12283bd1-ea0d-38d1-f64d-508596e48cd9@intel.com> <6ce31b20-19ea-ccaf-17d4-f36ab3959710@samsung.com> <5a045a4f-4037-cebf-ea02-7018c28918d0@samsung.com> <7edc7b3b-4a35-3b90-a10a-de67e7f31261@intel.com> <7868b0b5-d4f5-9e64-ec55-f7bbdc45b400@samsung.com> <6e498ac6-2f4a-9fb1-4da9-b6c622c82e60@intel.com> <9f6b885a-a795-19c3-64eb-edf69d72200a@nxp.com> <3e685127-a32a-3c4c-63d5-5ccdc0181fbc@intel.com> Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Nov 2018 16:50:03 -0000 On 27.11.2018 13:33, Burakov, Anatoly wrote: > On 27-Nov-18 10:26 AM, Hemant Agrawal wrote: >> >> On 11/26/2018 8:55 PM, Asaf Sinai wrote: >>> +CC Ilia & Sasha. >>> >>> -----Original Message----- >>> From: Burakov, Anatoly >>> Sent: Monday, November 26, 2018 04:57 PM >>> To: Ilya Maximets ; Asaf Sinai ; dev@dpdk.org; Thomas Monjalon >>> Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration >>> >>> On 26-Nov-18 2:32 PM, Ilya Maximets wrote: >>>> On 26.11.2018 17:21, Burakov, Anatoly wrote: >>>>> On 26-Nov-18 2:10 PM, Ilya Maximets wrote: >>>>>> On 26.11.2018 16:42, Burakov, Anatoly wrote: >>>>>>> On 26-Nov-18 1:20 PM, Ilya Maximets wrote: >>>>>>>> On 26.11.2018 16:16, Ilya Maximets wrote: >>>>>>>>> On 26.11.2018 15:50, Burakov, Anatoly wrote: >>>>>>>>>> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote: >>>>>>>>>>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote: >>>>>>>>>>>> Hi Anatoly, >>>>>>>>>>>> >>>>>>>>>>>> We did not check it with "testpmd", only with our application. >>>>>>>>>>>>       From the beginning, we did not enable this configuration (look at attached files), and everything works fine. >>>>>>>>>>>> Of course we rebuild DPDK, when we change configuration. >>>>>>>>>>>> Please note that we use DPDK 17.11.3, maybe this is why it works fine? >>>>>>>>>>> Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it. >>>>>>>>>>> >>>>>>>>>> +CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES. >>>>>>>>>> >>>>>>>>>> Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading. >>>>>>>>>> >>>>>>>>>> In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA. >>>>>>>>>> >>>>>>>>>> The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need. >>>>>>>>>> >>>>>>>>>> In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1! >>>>>>>>> I'd consider this as a bug. >>>>>>>>> >>>>>>>>>> +CC Thomas >>>>>>>>>> >>>>>>>>>> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case. >>>>>>>>>> >>>>>>>>>> I would also argue that it is not relevant to 18.05+ memory subsystem, and should only work in legacy mode, because it is *impossible* to make it work right in the new memory subsystem, and here's why: >>>>>>>>>> >>>>>>>>>> Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1. >>>>>>>>>> >>>>>>>>>> If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0. >>>>>>>>>> >>>>>>>>>> Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux. >>>>>>>>> I agree that new memory model could not work without libnuma, >>>>>>>>> i.e. will lead to unpredictable memory allocations with no any >>>>>>>>> respect to requested socket_id's. I also agree that >>>>>>>>> CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only sane for a legacy memory model. >>>>>>>>> It looks like we have no other choice than just drop the option >>>>>>>>> and make the code unconditional, i.e. have hard dependency on libnuma. >>>>>>>>> >>>>>>>> We, probably, could compile this code and have hard dependency >>>>>>>> only for platforms with 'RTE_MAX_NUMA_NODES > 1'. >>>>>>> Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files. >>>>>> The option was introduced because we didn't want to introduce the >>>>>> new hard dependency. Since we'll have it anyway, I'm not sure if >>>>>> keeping the option for legacy mode makes any sense. >>>>> Oh yes, you're right. Drop it is! >>>>> >>>>>>> As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?). >>>>>> At least ARMv7 builds commonly does not ship libnuma package. >>>>> Do you mean libnuma builds for ARMv7 are not available? Or do you mean the libnuma package is not installed by default? >>>>> >>>>> If it's the latter, then i believe it's not installed by default anywhere, but if using distribution version of DPDK, libnuma will be taken care of via package manager. Presumably building from source can be taken care of with pkg-config/meson. >>>>> >>>>> Or do you mean ARMv7 does not have libnuma for their arch at all, in any distro? >>>> libnuma builds for ARMv7 are not available in most of the distros. I >>>> didn't check all, but here is results for Ubuntu: >>>>        https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpac >>>> kages.ubuntu.com%2Fsearch%3Fsuite%3Dbionic%26arch%3Darmhf%26searchon%3 >>>> Dnames%26keywords%3Dlibnuma&data=02%7C01%7CAsafSi%40radware.com%7C >>>> a44f84bca42d4a52acac08d653af83b8%7C6ae4e000b5d04f48a766402d46119b76%7C >>>> 0%7C0%7C636788410626179927&sdata=1pJ0WkAs6Y%2Bv3w%2BhKAELBw%2BjMra >>>> BnhiqqpsXkRv2ifI%3D&reserved=0 >>>> >>>> You may see that Ubuntu 18.04 (bionic) has no libnuma package for >>>> 'armhf' and also 'powerpc' platforms. >>>> >>> That's a difficulty. Do these platforms support NUMA? In other words, could we replace this flag with just outright disabling NUMA support? >> >> Many platforms don't support NUMA, so they dont' really need libnuma. >> >> Mandating libnuma will also break several things: >> >>     - cross build for ARM on x86 - which is among the preferred method >> for build by many in ARM community. >> >>    - many of the embedded SoCs are without NUMA support, they use smaller >> rootf (e.g. Yocto).  It will be a burden to add libnuma there. >> > > OK, point taken. > > So, the alternative would be to have the ability to outright disable NUMA support (either with a new option, or reworking this one - i would prefer a new one, since this one is confusingly named). Meaning, report all cores as socket 0, report all hardware as socket 0, report all memory as socket 0 and never care about NUMA nodes anywhere. > > Would that work? E.g. by default, make libnuma a hard dependency on x86 Linux (but allow to disable it), but disable it everywhere else? I think, you may just rename the RTE_EAL_NUMA_AWARE_HUGEPAGES to something like RTE_EAL_NUMA_SUPPORT and keep all the defaults as is, i.e. * globally disabled * enabled for linux * disabled for armv7a, dpaa, dpaa2 and stingray. Meson could handle everything dynamically. >> >>> >>>>>>> For those compiling from source - are there any supported >>>>>>> distributions which don't package libnuma? I don't see much sense >>>>>>> in keeping libnuma optional, IMO. This is of course up to the tech >>>>>>> board to decide, but IMO the "without libnuma it's basically >>>>>>> broken" argument is very strong in my opinion :) >>>>>>> >>>>> >>> >>> -- >>> Thanks, >>> Anatoly > >