From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by dpdk.org (Postfix) with ESMTP id 77A1D2BB8 for ; Thu, 10 Mar 2016 02:36:37 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP; 09 Mar 2016 17:36:35 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,313,1455004800"; d="scan'208";a="906557436" Received: from shwdeisgchi083.ccr.corp.intel.com (HELO [10.239.67.193]) ([10.239.67.193]) by orsmga001.jf.intel.com with ESMTP; 09 Mar 2016 17:36:34 -0800 To: "Ananyev, Konstantin" , Panu Matilainen , "dev@dpdk.org" References: <1453661393-85704-1-git-send-email-jianfeng.tan@intel.com> <1457085957-115339-1-git-send-email-jianfeng.tan@intel.com> <56DE9359.1090705@redhat.com> <56DF0E0A.8000108@intel.com> <56E01F94.2060906@redhat.com> <56E02AC2.7010704@intel.com> <2601191342CEEE43887BDE71AB97725836B1A536@irsmsx105.ger.corp.intel.com> <56E03078.3000501@intel.com> <2601191342CEEE43887BDE71AB97725836B1A5A2@irsmsx105.ger.corp.intel.com> <56E03977.7050103@intel.com> <2601191342CEEE43887BDE71AB97725836B1A5FD@irsmsx105.ger.corp.intel.com> <56E06156.2080400@intel.com> <2601191342CEEE43887BDE71AB97725836B1AA0C@irsmsx105.ger.corp.intel.com> From: "Tan, Jianfeng" Message-ID: <56E0CFA1.7030303@intel.com> Date: Thu, 10 Mar 2016 09:36:33 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <2601191342CEEE43887BDE71AB97725836B1AA0C@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Mar 2016 01:36:38 -0000 On 3/10/2016 3:33 AM, Ananyev, Konstantin wrote: > >>>>>>>>>> On 3/8/2016 4:54 PM, Panu Matilainen wrote: >>>>>>>>>>> On 03/04/2016 12:05 PM, Jianfeng Tan wrote: >>>>>>>>>>>> This patch adds option, --avail-cores, to use lcores which are >>>>>>>>>>>> available >>>>>>>>>>>> by calling pthread_getaffinity_np() to narrow down detected cores >>>>>>>>>>>> before >>>>>>>>>>>> parsing coremask (-c), corelist (-l), and coremap (--lcores). >>>>>>>>>>>> >>>>>>>>>>>> Test example: >>>>>>>>>>>> $ taskset 0xc0000 ./examples/helloworld/build/helloworld \ >>>>>>>>>>>> --avail-cores -m 1024 >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Jianfeng Tan >>>>>>>>>>>> Acked-by: Neil Horman >>>>>>>>>>> Hmm, to me this sounds like something that should be done always so >>>>>>>>>>> there's no need for an option. Or if there's a chance it might do the >>>>>>>>>>> wrong thing in some rare circumstance then perhaps there should be a >>>>>>>>>>> disabler option instead? >>>>>>>>>> Thanks for comments. >>>>>>>>>> >>>>>>>>>> Yes, there's a use case that we cannot handle. >>>>>>>>>> >>>>>>>>>> If we make it as default, DPDK applications may fail to start, when user >>>>>>>>>> specifies a core in isolcpus and its parent process (say bash) has a >>>>>>>>>> cpuset affinity that excludes isolcpus. Originally, DPDK applications >>>>>>>>>> just blindly do pthread_setaffinity_np() and it always succeeds because >>>>>>>>>> it always has root privilege to change any cpu affinity. >>>>>>>>>> >>>>>>>>>> Now, if we do the checking in rte_eal_cpu_init(), those lcores will be >>>>>>>>>> flagged as undetected (in my older implementation) and leads to failure. >>>>>>>>>> To make it correct, we would always add "taskset mask" (or other ways) >>>>>>>>>> before DPDK application cmd lines. >>>>>>>>>> >>>>>>>>>> How do you think? >>>>>>>>> I still think it sounds like something that should be done by default >>>>>>>>> and maybe be overridable with some flag, rather than the other way >>>>>>>>> around. Another alternative might be detecting the cores always but if >>>>>>>>> running as root, override but with a warning. >>>>>>>> For your second solution, only root can setaffinity to isolcpus? >>>>>>>> Your first solution seems like a promising way for me. >>>>>>>> >>>>>>>>> But I dont know, just wondering. To look at it from another angle: why >>>>>>>>> would somebody use this new --avail-cores option and in what >>>>>>>>> situation, if things "just work" otherwise anyway? >>>>>>>> For DPDK applications, the most common case to initialize DPDK is like >>>>>>>> this: "$dpdk-app [options for DPDK] -- [options for app]", so users need >>>>>>>> to specify which cores to run and how much hugepages are used. Suppose >>>>>>>> we need this dpdk-app to run in a container, users already give those >>>>>>>> information when they build up the cgroup for it to run inside, this >>>>>>>> option or this patch is to make DPDK more smart to discover how much >>>>>>>> resource will be used. Make sense? >>>>>>> But then, all we need might be just a script that would extract this information from the system >>>>>>> and form a proper cmdline parameter for DPDK? >>>>>> Yes, a script will work. Or to construct (argc, argv) to call >>>>>> rte_eal_init() in the application. But as Neil Horman once suggested, a >>>>>> simple pthread_getaffinity_np() will get all things done. So if it worth >>>>>> a patch here? >>>>> Don't know... >>>>> Personally I would prefer not to put extra logic inside EAL. >>>>> For me - there are too many different options already. >>>> Then how about make it default in rte_eal_cpu_init()? And it is already >>>> known it will bring trouble to those use isolcpus users, they need to >>>> add "taskset [mask]" before starting a DPDK app. >>> As I said - provide a script? >> Yes. But what I want to say is this script is hard to be right, if there >> are different kinds of limitations. (Barely happen though :-) ) > My thought was to keep dpdk code untouched - i.e. let it still blindly set_pthread_affinity() > based on the input parameters, and in addition provide a script for those who want to run > in '--avail-cores' mode. > So it could do 'taskset -p $$' and then either form -c parameter list for the app, > or check existing -c/-l/--lcores parameter and complain if not allowed pcpu detected. > But ok, might be it is easier and more convenient to have this logic inside EAL, > then in a separate script. > >>> Same might be for amount of hugepage memory available to the user? >> Ditto. Limitations like hugetlbfs quota, cgroup hugetlb, some are used >> by app themself (more like an artificial argument) ... >>>>> From other side looking at the patch itself: >>>>> You are updating lcore_count and lcore_config[],based on physical cpu availability, >>>>> but these days it is not always one-to-one mapping between EAL lcore and physical cpu. >>>>> Shouldn't that be taken into account? >>>> I have not see the problem so far, because this work is done before >>>> parsing coremask (-c), corelist (-l), and coremap (--lcores). If a core >>>> is disabled here, it's like it is not detected in rte_eal_cpu_init(). Or >>>> could you please give more hints? >>> I didn't test try changes, so probably I am missing something. >>> Let say iuser allowed to use only cpus 0-3. >>> If he would type with: >>> --avail-cores --lcores='(1-7)@2', >>> then only lcores 1-3 would be started. >>> Again if user would specify '2@(1-7)' it would also be undetected >>> that cpus 4-7 are note available to the user. >>> Is that so? >> After reading the code: >> For case --lcores='(1-7)@2', lcores 1-7 would be started, and bind to >> pcore 2. >> For case --lcores='2@(1-7)', this will fail with "core 4 unavailable". >> >> It's because: >> a. although 1:1 mapping is built-up and flagged as detected if pcore is >> found in sysfs. (ROLE_RTE, cpuset, detected is true) >> b. in the beginning of eal_parse_lcores(), "reset lcore config". >> (ROLE_OFF, cpuset is empty, detected is still true) >> c. pcore cpuset will be checked by convert_to_cpuset using the previous >> "detected" value. > Ok, my bad then - I misunderstood the code. > Thanks for explanation. > So if I get it right now - first inside lib/librte_eal/common/eal_common_lcore.c > Both lcore_count and lcore_config relate to the pcpus. > Then later, at lib/librte_eal/common/eal_common_options.c > they are overwritten related to lcores information. > Except lcore_config[].detected, which seems kept intact. > Is that correct? Yes, exactly. And really appreciate that you raise up this question for discussion. > >> I have tested it with the patch. Result aligns above analysis. >> For case --lcores='(1-7)@2': sudo taskset 0xf >> ./examples/helloworld/build/helloworld --avail-cores --lcores='(1-7)@2' >> ... >> hello from core 2 >> hello from core 3 >> hello from core 4 >> hello from core 5 >> hello from core 6 >> hello from core 7 >> hello from core 1 >> >> For case --lcores='2@(1-7)': sudo taskset 0xf >> ./examples/helloworld/build/helloworld --avail-cores --lcores='2@(1-7)' >> ... >> EAL: core 4 unavailable >> EAL: invalid parameter for --lcores >> ... >> >> One thing may worth mention: shall "detected" be maintained in struct >> lcore_config? Maybe we need to maintain an data structure for pcores? > Yes, it might be good to split pcpu and lcores information somehow, > as it is a bit confusing right now. > But I suppose this is a subject for another patch/discussion. Yes, just another topic. Thanks, Jianfeng > Konstantin > >