DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Tan, Jianfeng" <jianfeng.tan@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
	Panu Matilainen <pmatilai@redhat.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores
Date: Thu, 10 Mar 2016 09:36:33 +0800	[thread overview]
Message-ID: <56E0CFA1.7030303@intel.com> (raw)
In-Reply-To: <2601191342CEEE43887BDE71AB97725836B1AA0C@irsmsx105.ger.corp.intel.com>



On 3/10/2016 3:33 AM, Ananyev, Konstantin wrote:
>
>>>>>>>>>> On 3/8/2016 4:54 PM, Panu Matilainen wrote:
>>>>>>>>>>> On 03/04/2016 12:05 PM, Jianfeng Tan wrote:
>>>>>>>>>>>> This patch adds option, --avail-cores, to use lcores which are
>>>>>>>>>>>> available
>>>>>>>>>>>> by calling pthread_getaffinity_np() to narrow down detected cores
>>>>>>>>>>>> before
>>>>>>>>>>>> parsing coremask (-c), corelist (-l), and coremap (--lcores).
>>>>>>>>>>>>
>>>>>>>>>>>> Test example:
>>>>>>>>>>>> $ taskset 0xc0000 ./examples/helloworld/build/helloworld \
>>>>>>>>>>>>             --avail-cores -m 1024
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Jianfeng Tan <jianfeng.tan@intel.com>
>>>>>>>>>>>> Acked-by: Neil Horman <nhorman@tuxdriver.com>
>>>>>>>>>>> Hmm, to me this sounds like something that should be done always so
>>>>>>>>>>> there's no need for an option. Or if there's a chance it might do the
>>>>>>>>>>> wrong thing in some rare circumstance then perhaps there should be a
>>>>>>>>>>> disabler option instead?
>>>>>>>>>> Thanks for comments.
>>>>>>>>>>
>>>>>>>>>> Yes, there's a use case that we cannot handle.
>>>>>>>>>>
>>>>>>>>>> If we make it as default, DPDK applications may fail to start, when user
>>>>>>>>>> specifies a core in isolcpus and its parent process (say bash) has a
>>>>>>>>>> cpuset affinity that excludes isolcpus. Originally, DPDK applications
>>>>>>>>>> just blindly do pthread_setaffinity_np() and it always succeeds because
>>>>>>>>>> it always has root privilege to change any cpu affinity.
>>>>>>>>>>
>>>>>>>>>> Now, if we do the checking in rte_eal_cpu_init(), those lcores will be
>>>>>>>>>> flagged as undetected (in my older implementation) and leads to failure.
>>>>>>>>>> To make it correct, we would always add "taskset mask" (or other ways)
>>>>>>>>>> before DPDK application cmd lines.
>>>>>>>>>>
>>>>>>>>>> How do you think?
>>>>>>>>> I still think it sounds like something that should be done by default
>>>>>>>>> and maybe be overridable with some flag, rather than the other way
>>>>>>>>> around. Another alternative might be detecting the cores always but if
>>>>>>>>> running as root, override but with a warning.
>>>>>>>> For your second solution, only root can setaffinity to isolcpus?
>>>>>>>> Your first solution seems like a promising way for me.
>>>>>>>>
>>>>>>>>> But I dont know, just wondering. To look at it from another angle: why
>>>>>>>>> would somebody use this new --avail-cores option and in what
>>>>>>>>> situation, if things "just work" otherwise anyway?
>>>>>>>> For DPDK applications, the most common case to initialize DPDK is like
>>>>>>>> this: "$dpdk-app [options for DPDK] -- [options for app]", so users need
>>>>>>>> to specify which cores to run and how much hugepages are used. Suppose
>>>>>>>> we need this dpdk-app to run in a container, users already give those
>>>>>>>> information when they build up the cgroup for it to run inside, this
>>>>>>>> option or this patch is to make DPDK more smart to discover how much
>>>>>>>> resource will be used. Make sense?
>>>>>>> But then, all we need might be just a script that would extract this information from the system
>>>>>>> and form a proper cmdline parameter for DPDK?
>>>>>> Yes, a script will work. Or to construct (argc, argv) to call
>>>>>> rte_eal_init() in the application. But as Neil Horman once suggested, a
>>>>>> simple pthread_getaffinity_np() will get all things done. So if it worth
>>>>>> a patch here?
>>>>> Don't know...
>>>>> Personally I would prefer not to put extra logic inside EAL.
>>>>> For me - there are too many different options already.
>>>> Then how about make it default in rte_eal_cpu_init()? And it is already
>>>> known it will bring trouble to those use isolcpus users, they need to
>>>> add "taskset [mask]" before starting a DPDK app.
>>> As I said - provide a script?
>> Yes. But what I want to say is this script is hard to be right, if there
>> are different kinds of limitations. (Barely happen though :-) )
> My thought was to keep dpdk code untouched - i.e. let it still blindly set_pthread_affinity()
> based on the input parameters, and in addition provide a script for those who want to run
> in '--avail-cores' mode.
> So it could do 'taskset -p $$' and then either form -c parameter list  for the app,
> or check existing -c/-l/--lcores parameter and complain if not allowed pcpu detected.
> But ok, might be it is easier and more convenient to have this logic inside EAL,
> then in a separate script.
>
>>> Same might be for amount of hugepage memory available to the user?
>> Ditto. Limitations like hugetlbfs quota, cgroup hugetlb, some are used
>> by app themself (more like an artificial argument) ...
>>>>>    From other side looking at the patch itself:
>>>>> You are updating lcore_count and lcore_config[],based on physical cpu availability,
>>>>> but these days it is not always one-to-one mapping between EAL lcore and physical cpu.
>>>>> Shouldn't that be taken into account?
>>>> I have not see the problem so far, because this work is done before
>>>> parsing coremask (-c), corelist (-l), and coremap (--lcores). If a core
>>>> is disabled here, it's like it is not detected in rte_eal_cpu_init(). Or
>>>> could you please give more hints?
>>> I didn't test try changes, so probably I am missing something.
>>> Let say iuser allowed to use only cpus 0-3.
>>> If he would type with:
>>>    --avail-cores  --lcores='(1-7)@2',
>>> then only lcores 1-3 would be started.
>>> Again if user would specify '2@(1-7)' it would also be undetected
>>> that cpus 4-7 are note available to the user.
>>> Is that so?
>> After reading the code:
>> For case --lcores='(1-7)@2', lcores 1-7 would be started, and bind to
>> pcore 2.
>> For case --lcores='2@(1-7)', this will fail with "core 4 unavailable".
>>
>> It's because:
>> a.  although 1:1 mapping is built-up and flagged as detected if pcore is
>> found in sysfs. (ROLE_RTE, cpuset, detected is true)
>> b. in the beginning of eal_parse_lcores(), "reset lcore config".
>> (ROLE_OFF, cpuset is empty, detected is still true)
>> c. pcore cpuset will be checked by convert_to_cpuset using the previous
>> "detected" value.
> Ok, my bad then - I misunderstood the code.
> Thanks for explanation.
> So if I get it right now - first inside lib/librte_eal/common/eal_common_lcore.c
> Both lcore_count and lcore_config relate to the pcpus.
> Then later, at lib/librte_eal/common/eal_common_options.c
> they are overwritten related to lcores information.
> Except lcore_config[].detected, which seems kept intact.
> Is that correct?

Yes, exactly. And really appreciate that you raise up this question for 
discussion.

>
>> I have tested it with the patch. Result aligns above analysis.
>> For case --lcores='(1-7)@2': sudo taskset 0xf
>> ./examples/helloworld/build/helloworld --avail-cores --lcores='(1-7)@2'
>> ...
>> hello from core 2
>> hello from core 3
>> hello from core 4
>> hello from core 5
>> hello from core 6
>> hello from core 7
>> hello from core 1
>>
>> For case --lcores='2@(1-7)': sudo taskset 0xf
>> ./examples/helloworld/build/helloworld --avail-cores --lcores='2@(1-7)'
>> ...
>> EAL: core 4 unavailable
>> EAL: invalid parameter for --lcores
>> ...
>>
>> One thing may worth mention: shall "detected" be maintained in struct
>> lcore_config? Maybe we need to maintain an data structure for pcores?
> Yes, it might be good to split pcpu and lcores information somehow,
> as it is a bit confusing right now.
> But I suppose this is a subject for another patch/discussion.

Yes, just another topic.

Thanks,
Jianfeng

> Konstantin
>
>

  reply	other threads:[~2016-03-10  1:36 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-24 18:49 [dpdk-dev] [RFC] eal: add cgroup-aware resource self discovery Jianfeng Tan
2016-01-25 13:46 ` Neil Horman
2016-01-26  2:22   ` Tan, Jianfeng
2016-01-26 14:19     ` Neil Horman
2016-01-27 12:02       ` Tan, Jianfeng
2016-01-27 17:30         ` Neil Horman
2016-01-29 11:22 ` [dpdk-dev] [PATCH] eal: make resource initialization more robust Jianfeng Tan
2016-02-01 18:08   ` Neil Horman
2016-02-22  6:08   ` Tan, Jianfeng
2016-02-22 13:18     ` Neil Horman
2016-02-28 21:12   ` Thomas Monjalon
2016-02-29  1:50     ` Tan, Jianfeng
2016-03-04 10:05 ` [dpdk-dev] [PATCH] eal: add option --avail-cores to detect lcores Jianfeng Tan
2016-03-08  8:54   ` Panu Matilainen
2016-03-08 17:38     ` Tan, Jianfeng
2016-03-09 13:05       ` Panu Matilainen
2016-03-09 13:53         ` Tan, Jianfeng
2016-03-09 14:01           ` Ananyev, Konstantin
2016-03-09 14:17             ` Tan, Jianfeng
2016-03-09 14:44               ` Ananyev, Konstantin
2016-03-09 14:55                 ` Tan, Jianfeng
2016-03-09 15:17                   ` Ananyev, Konstantin
2016-03-09 17:45                     ` Tan, Jianfeng
2016-03-09 19:33                       ` Ananyev, Konstantin
2016-03-10  1:36                         ` Tan, Jianfeng [this message]
2016-05-18 12:46         ` David Marchand
2016-05-19  2:25           ` Tan, Jianfeng
2016-06-30 13:43             ` Thomas Monjalon
2016-07-01  0:52               ` Tan, Jianfeng
2016-04-26 12:39   ` Tan, Jianfeng
2016-03-04 10:58 ` [dpdk-dev] [PATCH] eal: make hugetlb initialization more robust Jianfeng Tan
2016-03-08  1:42   ` [dpdk-dev] [PATCH v2] " Jianfeng Tan
2016-03-08  8:46     ` Tan, Jianfeng
2016-05-04 11:07     ` Sergio Gonzalez Monroy
2016-05-04 11:28       ` Tan, Jianfeng
2016-05-04 12:25     ` Sergio Gonzalez Monroy
2016-05-09 10:48   ` [dpdk-dev] [PATCH v3] " Jianfeng Tan
2016-05-10  8:54     ` Sergio Gonzalez Monroy
2016-05-10  9:11       ` Tan, Jianfeng
2016-05-12  0:44   ` [dpdk-dev] [PATCH v4] " Jianfeng Tan
2016-05-17 16:39     ` David Marchand
2016-05-18  7:56       ` Sergio Gonzalez Monroy
2016-05-18  9:34         ` David Marchand
2016-05-19  2:00       ` Tan, Jianfeng
2016-05-17 16:40     ` Thomas Monjalon
2016-05-18  8:06       ` Sergio Gonzalez Monroy
2016-05-18  9:38         ` David Marchand
2016-05-19  2:11         ` Tan, Jianfeng
2016-05-31  3:37 ` [dpdk-dev] [PATCH v5] eal: fix allocating all free hugepages Jianfeng Tan
2016-06-06  2:49   ` Pei, Yulong
2016-06-08 11:27   ` Sergio Gonzalez Monroy
2016-06-30 13:34     ` Thomas Monjalon
2016-08-31  3:07 ` [dpdk-dev] [PATCH v2] eal: restrict cores detection Jianfeng Tan
2016-08-31 15:30   ` Stephen Hemminger
2016-09-01  1:15     ` Tan, Jianfeng
2016-09-01  1:31 ` [dpdk-dev] [PATCH v3] " Jianfeng Tan
2016-09-02 16:53   ` Bruce Richardson
2016-09-16 14:04     ` Thomas Monjalon
2016-09-16 14:02   ` Thomas Monjalon
2016-12-02 17:48   ` [dpdk-dev] [PATCH v4] eal: restrict cores auto detection Jianfeng Tan
2016-12-08 18:19     ` Thomas Monjalon
2016-12-09 15:14       ` Bruce Richardson
2016-12-21 14:31         ` Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56E0CFA1.7030303@intel.com \
    --to=jianfeng.tan@intel.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.ananyev@intel.com \
    --cc=pmatilai@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).