From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <david.hunt@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 0C0475B2E
 for <dev@dpdk.org>; Thu, 28 Mar 2019 15:42:03 +0100 (CET)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga101.jf.intel.com with ESMTP; 28 Mar 2019 07:42:02 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.60,280,1549958400"; d="scan'208";a="218416176"
Received: from dhunt5-mobl2.ger.corp.intel.com (HELO [10.237.210.26])
 ([10.237.210.26])
 by orsmga001.jf.intel.com with ESMTP; 28 Mar 2019 07:42:00 -0700
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>, dev@dpdk.org
Cc: liang.j.ma@intel.com
References: <20190222114551.30692-1-david.hunt@intel.com>
 <20190328131354.25222-1-david.hunt@intel.com>
 <6d478ef6-cc2a-71c0-d7b0-de86c04eb41c@intel.com>
From: "Hunt, David" <david.hunt@intel.com>
Message-ID: <2a10fb6d-348c-18ab-d727-c4179243d72e@intel.com>
Date: Thu, 28 Mar 2019 14:42:00 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101
 Thunderbird/60.5.3
MIME-Version: 1.0
In-Reply-To: <6d478ef6-cc2a-71c0-d7b0-de86c04eb41c@intel.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Subject: Re: [dpdk-dev] [PATCH v2] examples/distributor: detect high
	frequency cores
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 28 Mar 2019 14:42:04 -0000


On 28/3/2019 1:58 PM, Burakov, Anatoly wrote:
> On 28-Mar-19 1:13 PM, David Hunt wrote:
>> The distributor application is bottlenecked by the distributor core,
>> so if we can give more frequency to this core, then the overall
>> performance of the application may increase.
>>
>> This patch uses the rte_power_get_capabilities() API to query the
>> cores provided in the core mask, and if any high frequency cores are
>> found (e.g. Turbo Boost is enabled), we will pin the distributor
>> workload to that core.
>>
>> Signed-off-by: Liang Ma <liang.j.ma@intel.com>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>
> <...>
>
>> +    if (power_lib_initialised)
>> +        rte_power_exit(rte_lcore_id());
>>       printf("\nCore %u exiting tx task.\n", rte_lcore_id());
>>       return 0;
>>   }
>> @@ -575,9 +582,35 @@ lcore_worker(struct lcore_params *p)
>>           if (num > 0)
>>               app_stats.worker_bursts[p->worker_id][num-1]++;
>>       }
>> +    if (power_lib_initialised)
>> +        rte_power_exit(rte_lcore_id());
>> +    rte_free(p);
>>       return 0;
>>   }
>>   +static int
>> +init_power_library(void)
>> +{
>> +    int ret = 0, lcore_id;
>> +    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
>> +        if (rte_lcore_is_enabled(lcore_id)) {
>
> Please correct me if i'm wrong, but RTE_LCORE_FOREACH_SLAVE already 
> checks if the lcore is enabled.


You're correct, I'll fix in next version.


>
> <...>
>
>>   +    if (power_lib_initialised) {
>> +        /*
>> +         * Here we'll pre-assign lcore ids to the rx, tx and
>> +         * distributor workloads if there's higher frequency
>> +         * on those cores e.g. if Turbo Boost is enabled.
>> +         * It's also worth mentioning that it will assign cores in a
>> +         * specific order, so that if there's less than three
>> +         * available, the higher frequency cores will go to the
>> +         * distributor first, then rx, then tx.
>> +         */
>> +        RTE_LCORE_FOREACH_SLAVE(lcore_id) {
>> +
>> +            rte_power_get_capabilities(lcore_id, &lcore_cap);
>> +
>> +            if (lcore_cap.turbo == 1) {
>> +                priority_num++;
>> +                switch (priority_num) {
>> +                case 1:
>> +                    distr_core_id = lcore_id;
>> +                    printf("Distributor on priority core %d\n",
>> +                            lcore_id);
>> +                    break;
>> +                case 2:
>> +                    rx_core_id = lcore_id;
>> +                    printf("Rx on priority core %d\n",
>> +                            lcore_id);
>> +                    break;
>> +                case 3:
>> +                    tx_core_id = lcore_id;
>> +                    printf("Tx on priority core %d\n",
>> +                            lcore_id);
>> +                    break;
>> +                default:
>> +                    break;
>> +                }
>
> This seems to be doing the same thing as right below (assigning lcore 
> id's in order), yet in one case you use a switch, and in the other you 
> use a simple loop. I don't see priority_num used anywhere else, so you 
> might as well simplify this loop to be similar to what you have below, 
> with "skip-if-not-turbo, if not assigned, assign-and-continue" type flow.


There doing different things. The loop with the switch is looking for up 
to three priority cores, and storing those choices in distr_core_id, 
tx_core_id and rx_core_id. This is because we don't know which are the 
priority cores ahead of time. priority_num is used in the switch 
statement, and when it finds a priority core, it increments, so we know 
which variable to assign with the next available priority core. Imagine 
we have turbo enabled on cores 2,4 and 6. That's what I'm trying to 
solve here.

Then, when we get to the next loop, we're just assigning the 
non-priority cores if the three key workloads have not already been 
assigned a core, hence the simple loop, using the remaining cores.

I looked at simplifying the flow, but as far as I can see, I need two 
stages, a 'discovery' for the priority cores first, then whatever is 
left can be done in a normal loop.

Does that make sense, or my I missing an obvious refactor opportunity?


>
> Once that is fixed,
>
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>

From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by dpdk.space (Postfix) with ESMTP id 6B15AA0679
	for <public@inbox.dpdk.org>; Thu, 28 Mar 2019 15:42:06 +0100 (CET)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 79B085F17;
	Thu, 28 Mar 2019 15:42:05 +0100 (CET)
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 0C0475B2E
 for <dev@dpdk.org>; Thu, 28 Mar 2019 15:42:03 +0100 (CET)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga101.jf.intel.com with ESMTP; 28 Mar 2019 07:42:02 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.60,280,1549958400"; d="scan'208";a="218416176"
Received: from dhunt5-mobl2.ger.corp.intel.com (HELO [10.237.210.26])
 ([10.237.210.26])
 by orsmga001.jf.intel.com with ESMTP; 28 Mar 2019 07:42:00 -0700
To: "Burakov, Anatoly" <anatoly.burakov@intel.com>, dev@dpdk.org
Cc: liang.j.ma@intel.com
References: <20190222114551.30692-1-david.hunt@intel.com>
 <20190328131354.25222-1-david.hunt@intel.com>
 <6d478ef6-cc2a-71c0-d7b0-de86c04eb41c@intel.com>
From: "Hunt, David" <david.hunt@intel.com>
Message-ID: <2a10fb6d-348c-18ab-d727-c4179243d72e@intel.com>
Date: Thu, 28 Mar 2019 14:42:00 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101
 Thunderbird/60.5.3
MIME-Version: 1.0
In-Reply-To: <6d478ef6-cc2a-71c0-d7b0-de86c04eb41c@intel.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Content-Language: en-US
Subject: Re: [dpdk-dev] [PATCH v2] examples/distributor: detect high
	frequency cores
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>
Message-ID: <20190328144200.egW-UqzFZvlwzUrUmnlOEvtP-lY-w-bnuJEEjzLh1nM@z>


On 28/3/2019 1:58 PM, Burakov, Anatoly wrote:
> On 28-Mar-19 1:13 PM, David Hunt wrote:
>> The distributor application is bottlenecked by the distributor core,
>> so if we can give more frequency to this core, then the overall
>> performance of the application may increase.
>>
>> This patch uses the rte_power_get_capabilities() API to query the
>> cores provided in the core mask, and if any high frequency cores are
>> found (e.g. Turbo Boost is enabled), we will pin the distributor
>> workload to that core.
>>
>> Signed-off-by: Liang Ma <liang.j.ma@intel.com>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>
> <...>
>
>> +    if (power_lib_initialised)
>> +        rte_power_exit(rte_lcore_id());
>>       printf("\nCore %u exiting tx task.\n", rte_lcore_id());
>>       return 0;
>>   }
>> @@ -575,9 +582,35 @@ lcore_worker(struct lcore_params *p)
>>           if (num > 0)
>>               app_stats.worker_bursts[p->worker_id][num-1]++;
>>       }
>> +    if (power_lib_initialised)
>> +        rte_power_exit(rte_lcore_id());
>> +    rte_free(p);
>>       return 0;
>>   }
>>   +static int
>> +init_power_library(void)
>> +{
>> +    int ret = 0, lcore_id;
>> +    RTE_LCORE_FOREACH_SLAVE(lcore_id) {
>> +        if (rte_lcore_is_enabled(lcore_id)) {
>
> Please correct me if i'm wrong, but RTE_LCORE_FOREACH_SLAVE already 
> checks if the lcore is enabled.


You're correct, I'll fix in next version.


>
> <...>
>
>>   +    if (power_lib_initialised) {
>> +        /*
>> +         * Here we'll pre-assign lcore ids to the rx, tx and
>> +         * distributor workloads if there's higher frequency
>> +         * on those cores e.g. if Turbo Boost is enabled.
>> +         * It's also worth mentioning that it will assign cores in a
>> +         * specific order, so that if there's less than three
>> +         * available, the higher frequency cores will go to the
>> +         * distributor first, then rx, then tx.
>> +         */
>> +        RTE_LCORE_FOREACH_SLAVE(lcore_id) {
>> +
>> +            rte_power_get_capabilities(lcore_id, &lcore_cap);
>> +
>> +            if (lcore_cap.turbo == 1) {
>> +                priority_num++;
>> +                switch (priority_num) {
>> +                case 1:
>> +                    distr_core_id = lcore_id;
>> +                    printf("Distributor on priority core %d\n",
>> +                            lcore_id);
>> +                    break;
>> +                case 2:
>> +                    rx_core_id = lcore_id;
>> +                    printf("Rx on priority core %d\n",
>> +                            lcore_id);
>> +                    break;
>> +                case 3:
>> +                    tx_core_id = lcore_id;
>> +                    printf("Tx on priority core %d\n",
>> +                            lcore_id);
>> +                    break;
>> +                default:
>> +                    break;
>> +                }
>
> This seems to be doing the same thing as right below (assigning lcore 
> id's in order), yet in one case you use a switch, and in the other you 
> use a simple loop. I don't see priority_num used anywhere else, so you 
> might as well simplify this loop to be similar to what you have below, 
> with "skip-if-not-turbo, if not assigned, assign-and-continue" type flow.


There doing different things. The loop with the switch is looking for up 
to three priority cores, and storing those choices in distr_core_id, 
tx_core_id and rx_core_id. This is because we don't know which are the 
priority cores ahead of time. priority_num is used in the switch 
statement, and when it finds a priority core, it increments, so we know 
which variable to assign with the next available priority core. Imagine 
we have turbo enabled on cores 2,4 and 6. That's what I'm trying to 
solve here.

Then, when we get to the next loop, we're just assigning the 
non-priority cores if the three key workloads have not already been 
assigned a core, hence the simple loop, using the remaining cores.

I looked at simplifying the flow, but as far as I can see, I need two 
stages, a 'discovery' for the priority cores first, then whatever is 
left can be done in a normal loop.

Does that make sense, or my I missing an obvious refactor opportunity?


>
> Once that is fixed,
>
> Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>
>