From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 034CBA04BA;
	Wed,  7 Oct 2020 12:44:49 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id AC4714C9D;
	Wed,  7 Oct 2020 12:44:47 +0200 (CEST)
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 0BD442BC7
 for <dev@dpdk.org>; Wed,  7 Oct 2020 12:44:44 +0200 (CEST)
IronPort-SDR: 9WfccUwrjUZrIEqnMkKpMUr0AyXF+40+kX3dk1cSdH+3uDttjZcgFE9kyrb9M2Oy4Nemf4G2+Q
 1XettXREKXKQ==
X-IronPort-AV: E=McAfee;i="6000,8403,9766"; a="182364817"
X-IronPort-AV: E=Sophos;i="5.77,346,1596524400"; d="scan'208";a="182364817"
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga004.fm.intel.com ([10.253.24.48])
 by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Oct 2020 03:44:43 -0700
IronPort-SDR: bVMytXNjJQU4ugWBr+t2v4ECMnGIYYvP1zboEqAadP5iEujP2lmqSwQPaQ+XmLWrWW+Zmy1HzO
 UN0DCR5yWe2Q==
X-IronPort-AV: E=Sophos;i="5.77,346,1596524400"; d="scan'208";a="342812756"
Received: from rnicolau-mobl1.ger.corp.intel.com (HELO [10.251.81.252])
 ([10.251.81.252])
 by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 07 Oct 2020 03:44:41 -0700
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 "Van Haaren, Harry" <harry.van.haaren@intel.com>,
 Jerin Jacob <jerinjacobk@gmail.com>
Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
 "Richardson, Bruce" <bruce.richardson@intel.com>, "dev@dpdk.org"
 <dev@dpdk.org>, "jerinj@marvell.com" <jerinj@marvell.com>, nd <nd@arm.com>
References: <20200908105211.10066-1-radu.nicolau@intel.com>
 <BYAPR11MB3143790BD617C25354A5D28DD7380@BYAPR11MB3143.namprd11.prod.outlook.com>
 <DBAPR08MB5814A5FF0DE05A123E81603A98380@DBAPR08MB5814.eurprd08.prod.outlook.com>
 <46118f3466274596a663d7d44abb680a@intel.com>
 <DBAPR08MB5814907968595EE56F5E20A798390@DBAPR08MB5814.eurprd08.prod.outlook.com>
 <BYAPR11MB33015DF472440ED29BF3B2679A390@BYAPR11MB3301.namprd11.prod.outlook.com>
 <20200925102805.GD923@bricha3-MOBL.ger.corp.intel.com>
 <AM8PR08MB581035DD2604028F6A74337198350@AM8PR08MB5810.eurprd08.prod.outlook.com>
 <b2b1adc4-8bfc-89ba-827f-bc92f9d21684@intel.com>
 <CALBAE1MmsN8jeeOBgub5N0cM0R_JTcqcB0uZdsGK-SCEtcY2Cw@mail.gmail.com>
 <BYAPR11MB3143EEE2EDC63C57BDBC51BBD70D0@BYAPR11MB3143.namprd11.prod.outlook.com>
 <DM6PR11MB330881D3421DCCB8E9949CAF9A0D0@DM6PR11MB3308.namprd11.prod.outlook.com>
From: "Nicolau, Radu" <radu.nicolau@intel.com>
Message-ID: <a5002d5c-7c60-c189-bea7-8a61b7435755@intel.com>
Date: Wed, 7 Oct 2020 11:44:39 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
 Thunderbird/68.12.1
MIME-Version: 1.0
In-Reply-To: <DM6PR11MB330881D3421DCCB8E9949CAF9A0D0@DM6PR11MB3308.namprd11.prod.outlook.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
Subject: Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>


On 10/6/2020 11:13 AM, Ananyev, Konstantin wrote:
>>> -----Original Message-----
>>> From: Jerin Jacob <jerinjacobk@gmail.com>
>>> Sent: Monday, October 5, 2020 5:35 PM
>>> To: Nicolau, Radu <radu.nicolau@intel.com>
>>> Cc: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Richardson, Bruce
>>> <bruce.richardson@intel.com>; Ananyev, Konstantin
>>> <konstantin.ananyev@intel.com>; Van Haaren, Harry
>>> <harry.van.haaren@intel.com>; dev@dpdk.org; jerinj@marvell.com; nd
>>> <nd@arm.com>
>>> Subject: Re: [dpdk-dev] [PATCH v1] event/sw: performance improvements
>>>
>>> On Tue, Sep 29, 2020 at 2:32 PM Nicolau, Radu <radu.nicolau@intel.com> wrote:
>>>>
>>>> On 9/28/2020 5:02 PM, Honnappa Nagarahalli wrote:
>>>>> <snip>
>>>>>>> Add minimum burst throughout the scheduler pipeline and a flush counter.
>>>>>>> Replace ring API calls with local single threaded implementation where
>>>>>>> possible.
>>>>>>>
>>>>>>> Signed-off-by: Radu Nicolau mailto:radu.nicolau@intel.com
>>>>>>>
>>>>>>> Thanks for the patch, a few comments inline.
>>>>>>>
>>>>>>> Why not make these APIs part of the rte_ring library? You could further
>>>>>> optimize them by keeping the indices on the same cacheline.
>>>>>>> I'm not sure there is any need for non thread-safe rings outside this
>>>>>> particular case.
>>>>>>> [Honnappa] I think if we add the APIs, we will find the use cases.
>>>>>>> But, more than that, I understand that rte_ring structure is exposed to the
>>>>>> application. The reason for doing that is the inline functions that rte_ring
>>>>>> provides. IMO, we should still maintain modularity and should not use the
>>>>>> internals of the rte_ring structure outside of the library.
>>>>>>> +1 to that.
>>>>>>>
>>>>>>> BTW, is there any real perf benefit from such micor-optimisation?
>>>>>> I'd tend to view these as use-case specific, and I'm not sure we should clutter
>>>>>> up the ring library with yet more functions, especially since they can't be
>>>>>> mixed with the existing enqueue/dequeue functions, since they don't use
>>>>>> the head pointers.
>>>>> IMO, the ring library is pretty organized with the recent addition of HTS/RTS
>>> modes. This can be one of the modes and should allow us to use the existing
>>> functions (though additional functions are required as well).
>>>>> The other concern I have is, this implementation can be further optimized by
>>> using a single cache line for the pointers. It uses 2 cache lines just because of the
>>> layout of the rte_ring structure.
>>>>> There was a question earlier about the performance improvements of this
>>> patch? Are there any % performance improvements that can be shared?
>>>>> It is also possible to change the above functions to use the head/tail pointers
>>> from producer or the consumer cache line alone to check for perf differences.
>>>> I don't have a % for the final improvement for this change alone, but
>>>> there was some improvement in the memory overhead measurable during
>>>> development, which very likely resulted in the whole optimization having
>>>> more headroom.
>>>>
>>>> I agree that this may be further optimized, maybe by having a local
>>>> implementation of a ring-like container instead.
>>> Have we decided on the next steps for this patch? Is the plan to
>>> supersede this patch and have different
>>> one in rte_ring subsystem,
>> My preference is to merge this version of the patch;
>> 1) The ring helper functions are stripped to the SW PMD usage, and not valid to use in the general.
>> 2) Adding static inline APIs in an LTS without extensive doesn't seem a good idea.
>>
>> If Honnappa is OK with the above solution for 20.11, we can see about moving the rings part of the
>> code to rte_ring library location in 21.02, and give ourselves some time to settle the usage/API before
>> the next LTS.
>>
> As ring library maintainer I share Honnappa concern that another library not uses public ring API,
> but instead accesses ring internals directly. Obviously such coding practice is not welcomed
> as it makes harder to maintain/extend ring library in future.
> About 2) - these new API can(/shoud) be marked an experimental anyway.
> As another thing - it is still unclear what a performance gain we are talking about here.
> Is it really worth it comparing to just using SP/SC?

The change itself came after I analyzed the memory bound sections of the 
code, and I just did a quick test, I got about 3.5% improvement in 
throughput,  maybe not so much but significant for such a small change, 
and depending on the usecase it may be more.

As for the implementation itself, I would favour having a custom ring 
like container in the PMD code, this will solve the issue of using 
rte_ring internals while still allow for full optimisation. If this is 
acceptable, I will follow up by tomorrow.