From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ferruh.yigit@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id 6CC53BD28
 for <dev@dpdk.org>; Fri, 13 Jan 2017 16:51:03 +0100 (CET)
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
 by fmsmga103.fm.intel.com with ESMTP; 13 Jan 2017 07:51:02 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.33,222,1477983600"; d="scan'208";a="53508299"
Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.237.220.38])
 ([10.237.220.38])
 by fmsmga006.fm.intel.com with ESMTP; 13 Jan 2017 07:50:59 -0800
To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
References: <1482832175-27199-1-git-send-email-jerin.jacob@caviumnetworks.com>
 <1484212646-10338-1-git-send-email-jerin.jacob@caviumnetworks.com>
 <1484212646-10338-16-git-send-email-jerin.jacob@caviumnetworks.com>
 <6bb9980b-f546-38d5-044a-63507510f6a5@intel.com>
 <20170113081641.GA17635@localhost.localdomain>
 <b528e477-194e-ad1a-2b5e-3422c6d49721@intel.com>
 <20170113145753.GB13558@localhost.localdomain>
Cc: dev@dpdk.org, konstantin.ananyev@intel.com, thomas.monjalon@6wind.com,
 bruce.richardson@intel.com, jianbo.liu@linaro.org, viktorin@rehivetech.com,
 santosh.shukla@caviumnetworks.com, John Griffin <john.griffin@intel.com>,
 Fiona Trahe <fiona.trahe@intel.com>,
 Deepak Kumar Jain <deepak.k.jain@intel.com>
From: Ferruh Yigit <ferruh.yigit@intel.com>
Message-ID: <c10e7cba-badb-3599-9603-c886d43e995d@intel.com>
Date: Fri, 13 Jan 2017 15:50:59 +0000
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101
 Thunderbird/45.6.0
MIME-Version: 1.0
In-Reply-To: <20170113145753.GB13558@localhost.localdomain>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH v3 15/29] crypto/qat: use eal I/O device
 memory read/write API
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Jan 2017 15:51:04 -0000

On 1/13/2017 2:57 PM, Jerin Jacob wrote:
> On Fri, Jan 13, 2017 at 11:32:29AM +0000, Ferruh Yigit wrote:
>> On 1/13/2017 8:17 AM, Jerin Jacob wrote:
>>> On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote:
>>>> Hi Jerin,
>>>>
>>>> On 1/12/2017 9:17 AM, Jerin Jacob wrote:
>>>> <...>
>>>>
>>>>> +#include <rte_io.h>
>>>>> +
>>>>>  /* CSR write macro */
>>>>> -#define ADF_CSR_WR(csrAddr, csrOffset, val) \
>>>>> -	(void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \
>>>>> -			= (val)))
>>>>> +#define ADF_CSR_WR(csrAddr, csrOffset, val)		\
>>>>> +	rte_write32(val, (((uint8_t *)csrAddr) + csrOffset))
>>>>
>>>> For IA, this update introduces an extra compiler barrier (rte_io_wmb()),
>>>> which is indeed not a must, is this correct?
>>>
>>> AFAIK, Compiler barrier is required for IA. I am not an IA expert, if
>>> someone thinks it needs to changed then I can fix it in following commit
>>> in this patch series by making rte_io_wmb() and rte_io_rmb() as empty.
>>>
>>> Let me know.
>>>
>>> AFAIK, Linux kernel code has a barrier in readl/writel for IA.
>>>
>>> Typically we don't use any non relaxed versions in fast path.In fast
>>> typically all the drivers has explicit write barrier for doorbell write
>>> and followed by a relaxed version of write. IMO, In any event, it won't
>>> generate performance regression.
>>>
>>> [dpdk-master] $ git show
>>> 70c343bdc8c33a51a9db23cd58122bdfc120a58f
>>> commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f
>>> Author: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> Date:   Mon Dec 5 06:36:49 2016 +0530
>>>
>>>     eal/x86: define I/O device memory barriers for IA
>>>
>>>     The patch does not provide any functional change for IA.
>>>     I/O barriers are mapped to existing smp barriers.
>>>
>>>     CC: Bruce Richardson <bruce.richardson@intel.com>
>>>     CC: Konstantin Ananyev <konstantin.ananyev@intel.com>
>>>     Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>>
>>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>> b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>> index 00b1cdf..4eac666 100644
>>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h
>>> @@ -61,6 +61,12 @@ extern "C" {
>>>  
>>>  #define rte_smp_rmb() rte_compiler_barrier()
>>>  
>>> +#define rte_io_mb() rte_mb()
>>> +
>>> +#define rte_io_wmb() rte_compiler_barrier()
>>> +
>>> +#define rte_io_rmb() rte_compiler_barrier()
>>> +
>>>  /*------------------------- 16 bit atomic operations
>>>  * -------------------------*/
>>>  
>>>  #ifndef RTE_FORCE_INTRINSICS
>>>
>>>>
>>>> If so, does it make sense to override these functions for x86, and make
>>>> rte_writeX = rte_writeX_relaxed
>>>> rte_readX = rte_readX_relaxed
>>>>
>>>>>  
>>>>>  /* CSR read macro */
>>>>> -#define ADF_CSR_RD(csrAddr, csrOffset) \
>>>>> -	(*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)))
>>>>> +#define ADF_CSR_RD(csrAddr, csrOffset)			\
>>>>> +	rte_read32((((uint8_t *)csrAddr) + csrOffset))
>>>>
>>>> This patchset both introduces new rte_readX/rte_writeX functions, also
>>>> applies them into drivers.
>>>>
>>>> While applying them, it changes the behavior.
>>>> Like above code was doing a read, but after update it does read and
>>>> read_memory_barrier.
>>>>
>>>> What do you think this patchset updates usage in a manner that keeps
>>>> behavior exact same. Like using rte_read32_relaxed for this case.
>>>> And doing architecture related updates in a different patchset?
>>>
>>> Need to use rte_read32 at this commit otherwise it will break for ARM.
>>> That's was all point for this patchset.
>>
>> Why it breaks the ARM, is it because rte_*mb() updated for ARM in this
>> patchset (patch 7/29) ?
> 
> Yes.
> 
> 
>>
>> I believe it is good to make these modifications in two phase:
> 
> It is in two phases only. First introduced the API with implementation and
> enabled in each driver. Why did you think other-way around it is better?

For two things:
1- If something goes wrong, find the source of problem easier.
2- Make architectural changes obvious, right now it is a little hard to
see, and this again for item 1.

But I also would like to hear more comments before you change/try anything.

> I can rework and test if there is any value addition. If you concerned
> about git bisect ability then I don't think we are loosing that in this
> model.
> 
> Thoughts?
> 
>> - First replace old usage with rte_readX/rte_writeX while keeping exact
>> same behavior
>>
>> - Second, do architecture specific changes. Both in eal and drivers
>> level if required.
>>
>> Thanks,
>> ferruh
>>
>>> For performance regression, we can always verify by taking delta
>>> between this changeset and the previous changeset. If you think, I need
>>> to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that
>>> as well.
>>>
>>>
>>>>
>>>> This both makes easy to see architecture specific updates, and makes
>>>> easy to trace any possible performance issues by this patchset.
>>>>
>>>>>  
>>>>>  #define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL
>>>>>  #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL
>>>>>
>>>>
>>