From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mailout2.w1.samsung.com (mailout2.w1.samsung.com [210.118.77.12]) by dpdk.org (Postfix) with ESMTP id E334E1B5B8 for ; Mon, 26 Nov 2018 15:10:35 +0100 (CET) Received: from eucas1p1.samsung.com (unknown [182.198.249.206]) by mailout2.w1.samsung.com (KnoxPortal) with ESMTP id 20181126141034euoutp029195b6d90db5b32eb4306a7e61850f6a~qseiXMuMa2177821778euoutp02E for ; Mon, 26 Nov 2018 14:10:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 mailout2.w1.samsung.com 20181126141034euoutp029195b6d90db5b32eb4306a7e61850f6a~qseiXMuMa2177821778euoutp02E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsung.com; s=mail20170921; t=1543241434; bh=g28esN54DGA2dFVGg3nGu+CF+PqsP9hKKDacSuepsj8=; h=Subject:To:From:Date:In-Reply-To:References:From; b=HFuvNzGdT7nHv5ktVrnn8n5LYGXAHAml7UoBJj03iIHJF/FdUg4Liv+kT4F3hYAi7 t18+Nq35k71zMkg6VOzNvc7t37bP7G3gc+TtCm4k6bPe4IUmct+hnK6ixwS8LcFjyf 1z4zAKA/991kqoWhUBj/uT3+MwtwSE7XLapFX5eE= Received: from eusmges1new.samsung.com (unknown [203.254.199.242]) by eucas1p1.samsung.com (KnoxPortal) with ESMTP id 20181126141034eucas1p1485b0097a943c05aa4deb9f31c4a9047~qseh436o32797927979eucas1p1n; Mon, 26 Nov 2018 14:10:34 +0000 (GMT) Received: from eucas1p1.samsung.com ( [182.198.249.206]) by eusmges1new.samsung.com (EUCPMTA) with SMTP id B7.CA.04441.9DEFBFB5; Mon, 26 Nov 2018 14:10:34 +0000 (GMT) Received: from eusmtrp1.samsung.com (unknown [182.198.249.138]) by eucas1p1.samsung.com (KnoxPortal) with ESMTPA id 20181126141033eucas1p1655f5d4d355ab9fc17ecc03e8dddc5e4~qsehD7iGo1355113551eucas1p1J; Mon, 26 Nov 2018 14:10:33 +0000 (GMT) Received: from eusmgms1.samsung.com (unknown [182.198.249.179]) by eusmtrp1.samsung.com (KnoxPortal) with ESMTP id 20181126141033eusmtrp1a618abe0e1ba8a10087288a0bb2aafa2~qseg1ecVt0890108901eusmtrp10; Mon, 26 Nov 2018 14:10:33 +0000 (GMT) X-AuditID: cbfec7f2-a1ae89c000001159-3f-5bfbfed91ae8 Received: from eusmtip2.samsung.com ( [203.254.199.222]) by eusmgms1.samsung.com (EUCPMTA) with SMTP id FB.48.04284.9DEFBFB5; Mon, 26 Nov 2018 14:10:33 +0000 (GMT) Received: from [106.109.129.180] (unknown [106.109.129.180]) by eusmtip2.samsung.com (KnoxPortal) with ESMTPA id 20181126141032eusmtip20346754934fd425c406c725062088451~qsegar1de2045020450eusmtip2Z; Mon, 26 Nov 2018 14:10:32 +0000 (GMT) To: "Burakov, Anatoly" , Asaf Sinai , "dev@dpdk.org" , Thomas Monjalon From: Ilya Maximets Message-ID: Date: Mon, 26 Nov 2018 17:10:32 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <7edc7b3b-4a35-3b90-a10a-de67e7f31261@intel.com> Content-Language: en-GB Content-Transfer-Encoding: 8bit X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrMKsWRmVeSWpSXmKPExsWy7djPc7q3/v2ONrh/mNHi0b3FzBaNB5az WLz7tJ3J4kr7T3aLTw9OsDiwevxasJTVY/Gel0wex25OY/e4/fE8q0ffllWMAaxRXDYpqTmZ ZalF+nYJXBnnfx5kKnioXTHpyD7WBsaXyl2MnBwSAiYSsw5dY+pi5OIQEljBKHH66Ep2COcL o8TLr7OgnM+MEovuH2XsYuQAa5m7SxoivpwRyP7EBuF8ZJQ4dWI5WIewwGRGiQ1LPoBlRATm MkocPLuCHWQjm4COxKnVRxhBbF4BO4lfcx6ygdgsAqoSky9sZAaxRQUiJDrur2aDqBGUODnz CQuIzSlgK/FzxiwmEJtZQFyi6ctKVghbXqJ562xmkGUSAvPYJT7dOc0I8Z6LxNlTe1kgbGGJ V8e3sEPYMhKnJ/dAxesl7re8ZIRo7mCUmH7oHxNEwl5iy+tz7CBPMwtoSqzfpQ8RdpRo7djB CgkLPokbbwUhbuCTmLRtOjNEmFeio00IolpF4vfB5cwQtpTEzXefoS7wkLi+sp1pAqPiLCRf zkLy2Swkn81CuGEBI8sqRvHU0uLc9NRiw7zUcr3ixNzi0rx0veT83E2MwKRz+t/xTzsYv15K OsQowMGoxMO74fvvaCHWxLLiytxDjBIczEoivL5LgEK8KYmVValF+fFFpTmpxYcYpTlYlMR5 qxkeRAsJpCeWpGanphakFsFkmTg4pRoY41XfSRvFP+qd/eP0yekN7KXOH3pUVx/oCZnh6bvz gOWmhrsPTQ5JvdNafuX7b7vts7ja869MedD7aM2lqVb2wm4hL8xLwlbaqy6Mej7zi7nj/pi7 qeJ3X3bve7qV8/IMsY2mu+8tnWiRPmN9yIUj3U/f2Om1bf6x/6HIGavY5YxTj/PZ/Hy/OVCJ pTgj0VCLuag4EQBU/QeINgMAAA== X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFlrFIsWRmVeSWpSXmKPExsVy+t/xe7o3//2ONlj8i9vi0b3FzBaNB5az WLz7tJ3J4kr7T3aLTw9OsDiwevxasJTVY/Gel0wex25OY/e4/fE8q0ffllWMAaxRejZF+aUl qQoZ+cUltkrRhhZGeoaWFnpGJpZ6hsbmsVZGpkr6djYpqTmZZalF+nYJehnnfx5kKnioXTHp yD7WBsaXyl2MHBwSAiYSc3dJdzFycQgJLGWUuPR9KmMXIydQXErix68LrBC2sMSfa11sEEXv GSV2vZ0F5ggLTGaUeL59KpgjIjCXUWLmnCZWiLLDLBLPz9xlA+lnE9CROLX6CNhcXgE7iV9z HoLFWQRUJSZf2MgMYosKREicfbkOqkZQ4uTMJywgNqeArcTPGbOYQGxmAXWJP/MuMUPY4hJN X1ayQtjyEs1bZzNPYBSchaR9FpKWWUhaZiFpWcDIsopRJLW0ODc9t9hQrzgxt7g0L10vOT93 EyMwnrYd+7l5B+OljcGHGAU4GJV4eDd8/x0txJpYVlyZe4hRgoNZSYTXdwlQiDclsbIqtSg/ vqg0J7X4EKMp0HMTmaVEk/OBsZ5XEm9oamhuYWlobmxubGahJM573qAySkggPbEkNTs1tSC1 CKaPiYNTqoHR5dNyf5/v2fvDH3WfVVCMe2kZNpXvp2GG4v2EM+fbSkU6LXPWbO1yqahvmtvu yTYj+FesoWTivVXfbxjo6SVbyJ50/eBb15FcxJYdlMf24bHQi+y7hV63q7jKplx2e3DD60iy v5FUk4+u2ybx1VbTU7dq7Qj9pNE094uqd8BVxgVKCnunz1ZiKc5INNRiLipOBACLX6gVvQIA AA== X-CMS-MailID: 20181126141033eucas1p1655f5d4d355ab9fc17ecc03e8dddc5e4 X-Msg-Generator: CA Content-Type: text/plain; charset="utf-8" X-RootMTR: 20181126125108epcas1p2c649d8fa87fb739c2f13435e5ba80d88 X-EPHeader: CA CMS-TYPE: 201P X-CMS-RootMailID: 20181126125108epcas1p2c649d8fa87fb739c2f13435e5ba80d88 References: <2b09cec8-0883-2ed2-0264-aeef871ea6a9@intel.com> <518f9333-8d80-0fa2-d391-b4c8df181508@intel.com> <12283bd1-ea0d-38d1-f64d-508596e48cd9@intel.com> <6ce31b20-19ea-ccaf-17d4-f36ab3959710@samsung.com> <5a045a4f-4037-cebf-ea02-7018c28918d0@samsung.com> <7edc7b3b-4a35-3b90-a10a-de67e7f31261@intel.com> Subject: Re: [dpdk-dev] CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES: no difference in memory pool allocations, when enabling/disabling this configuration X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Nov 2018 14:10:36 -0000 On 26.11.2018 16:42, Burakov, Anatoly wrote: > On 26-Nov-18 1:20 PM, Ilya Maximets wrote: >> On 26.11.2018 16:16, Ilya Maximets wrote: >>> On 26.11.2018 15:50, Burakov, Anatoly wrote: >>>> On 26-Nov-18 11:43 AM, Burakov, Anatoly wrote: >>>>> On 26-Nov-18 11:33 AM, Asaf Sinai wrote: >>>>>> Hi Anatoly, >>>>>> >>>>>> We did not check it with "testpmd", only with our application. >>>>>>   From the beginning, we did not enable this configuration (look at attached files), and everything works fine. >>>>>> Of course we rebuild DPDK, when we change configuration. >>>>>> Please note that we use DPDK 17.11.3, maybe this is why it works fine? >>>>> >>>>> Just tested with DPDK 17.11, and yes, it does work the way you are describing. This is not intended behavior. I will look into it. >>>>> >>>> >>>> +CC author of commit introducing CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES. >>>> >>>> Looking at the code, i think this config option needs to be reworked and we should clarify what we mean by this option. It appears that i've misunderstood what this option actually intended to do, and i also think it's naming could be improved because it's confusing and misleading. >>>> >>>> In 17.11, this option does *not* prevent EAL from using NUMA - it merely disables using libnuma to perform memory allocation. This looks like intended (if counter-intuitive) behavior - disabling this option will simply revert DPDK to working as it did before this option was introduced (i.e. best-effort allocation). This is why your code still works - because EAL still does allocate memory on socket 1, and *knows* that it's socket 1 memory. It still supports NUMA. >>>> >>>> The commit message for these changes states that the actual purpose of this option is to enable "balanced" hugepage allocation. In case of cgroups limitations, previously, DPDK would've exhausted all hugepages on master core's socket before attempting to allocate from other sockets, but by the time we've reached cgroups limits on numbers of hugepages, we might not have reached socket 1 and thus missed out on the pages we could've allocated, but didn't. Using libnuma solves this issue, because now we can allocate pages on sockets we want, instead of hoping we won't run out of hugepages before we get the memory we need. >>>> >>>> In 18.05 onwards, this option works differently (and arguably wrong). More specifically, it disallows allocations on sockets other than 0, and it also makes it so that EAL does not check which socket the memory *actually* came from. So, not only allocating memory from socket 1 is disabled, but allocating from socket 0 may even get you memory from socket 1! >>> >>> I'd consider this as a bug. >>> >>>> >>>> +CC Thomas >>>> >>>> The CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES option is a misnomer, because it makes it seem like this option disables NUMA support, which is not the case. >>>> >>>> I would also argue that it is not relevant to 18.05+ memory subsystem, and should only work in legacy mode, because it is *impossible* to make it work right in the new memory subsystem, and here's why: >>>> >>>> Without libnuma, we have no way of "asking" the kernel to allocate a hugepage on a specific socket - instead, any allocation will most likely happen on socket from which the allocation came from. For example, if user program's lcore is on socket 1, allocation on socket 0 will actually allocate a page on socket 1. >>>> >>>> If we don't check for page's NUMA node affinity (which is what currently happens) - we get performance degradation because we may unintentionally allocate memory on wrong NUMA node. If we do check for this - then allocation of memory on socket 1 from lcore on socket 0 will almost never succeed, because kernel will always give us pages on socket 0. >>>> >>>> Put it simply, there is no sane way to make this option work for the new memory subsystem - IMO it should be dropped, and libnuma should be made a hard dependency on Linux. >>> >>> I agree that new memory model could not work without libnuma, i.e. will >>> lead to unpredictable memory allocations with no any respect to requested >>> socket_id's. I also agree that CONFIG_RTE_EAL_NUMA_AWARE_HUGEPAGES is only >>> sane for a legacy memory model. >>> It looks like we have no other choice than just drop the option and make >>> the code unconditional, i.e. have hard dependency on libnuma. >>> >> >> We, probably, could compile this code and have hard dependency only for >> platforms with 'RTE_MAX_NUMA_NODES > 1'. > > Well, as long as legacy mode stays supported, we have to keep the option. The "drop" part was referring to supporting it under the new memory system, not a literal drop from config files. The option was introduced because we didn't want to introduce the new hard dependency. Since we'll have it anyway, I'm not sure if keeping the option for legacy mode makes any sense. > > As for using RTE_MAX_NUMA_NODES, i don't think it's merited. Distributions cannot deliver different DPDK versions based on the number of sockets on a particular machine - so it would have to be a hard dependency for distributions anyway (does any distribution ship DPDK without libnuma?). At least ARMv7 builds commonly does not ship libnuma package. > > For those compiling from source - are there any supported distributions which don't package libnuma? I don't see much sense in keeping libnuma optional, IMO. This is of course up to the tech board to decide, but IMO the "without libnuma it's basically broken" argument is very strong in my opinion :) >