From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id 6CC53BD28 for ; Fri, 13 Jan 2017 16:51:03 +0100 (CET) Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga103.fm.intel.com with ESMTP; 13 Jan 2017 07:51:02 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.33,222,1477983600"; d="scan'208";a="53508299" Received: from fyigit-mobl1.ger.corp.intel.com (HELO [10.237.220.38]) ([10.237.220.38]) by fmsmga006.fm.intel.com with ESMTP; 13 Jan 2017 07:50:59 -0800 To: Jerin Jacob References: <1482832175-27199-1-git-send-email-jerin.jacob@caviumnetworks.com> <1484212646-10338-1-git-send-email-jerin.jacob@caviumnetworks.com> <1484212646-10338-16-git-send-email-jerin.jacob@caviumnetworks.com> <6bb9980b-f546-38d5-044a-63507510f6a5@intel.com> <20170113081641.GA17635@localhost.localdomain> <20170113145753.GB13558@localhost.localdomain> Cc: dev@dpdk.org, konstantin.ananyev@intel.com, thomas.monjalon@6wind.com, bruce.richardson@intel.com, jianbo.liu@linaro.org, viktorin@rehivetech.com, santosh.shukla@caviumnetworks.com, John Griffin , Fiona Trahe , Deepak Kumar Jain From: Ferruh Yigit Message-ID: Date: Fri, 13 Jan 2017 15:50:59 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 MIME-Version: 1.0 In-Reply-To: <20170113145753.GB13558@localhost.localdomain> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v3 15/29] crypto/qat: use eal I/O device memory read/write API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Jan 2017 15:51:04 -0000 On 1/13/2017 2:57 PM, Jerin Jacob wrote: > On Fri, Jan 13, 2017 at 11:32:29AM +0000, Ferruh Yigit wrote: >> On 1/13/2017 8:17 AM, Jerin Jacob wrote: >>> On Thu, Jan 12, 2017 at 07:09:22PM +0000, Ferruh Yigit wrote: >>>> Hi Jerin, >>>> >>>> On 1/12/2017 9:17 AM, Jerin Jacob wrote: >>>> <...> >>>> >>>>> +#include >>>>> + >>>>> /* CSR write macro */ >>>>> -#define ADF_CSR_WR(csrAddr, csrOffset, val) \ >>>>> - (void)((*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset)) \ >>>>> - = (val))) >>>>> +#define ADF_CSR_WR(csrAddr, csrOffset, val) \ >>>>> + rte_write32(val, (((uint8_t *)csrAddr) + csrOffset)) >>>> >>>> For IA, this update introduces an extra compiler barrier (rte_io_wmb()), >>>> which is indeed not a must, is this correct? >>> >>> AFAIK, Compiler barrier is required for IA. I am not an IA expert, if >>> someone thinks it needs to changed then I can fix it in following commit >>> in this patch series by making rte_io_wmb() and rte_io_rmb() as empty. >>> >>> Let me know. >>> >>> AFAIK, Linux kernel code has a barrier in readl/writel for IA. >>> >>> Typically we don't use any non relaxed versions in fast path.In fast >>> typically all the drivers has explicit write barrier for doorbell write >>> and followed by a relaxed version of write. IMO, In any event, it won't >>> generate performance regression. >>> >>> [dpdk-master] $ git show >>> 70c343bdc8c33a51a9db23cd58122bdfc120a58f >>> commit 70c343bdc8c33a51a9db23cd58122bdfc120a58f >>> Author: Jerin Jacob >>> Date: Mon Dec 5 06:36:49 2016 +0530 >>> >>> eal/x86: define I/O device memory barriers for IA >>> >>> The patch does not provide any functional change for IA. >>> I/O barriers are mapped to existing smp barriers. >>> >>> CC: Bruce Richardson >>> CC: Konstantin Ananyev >>> Signed-off-by: Jerin Jacob >>> >>> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>> b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>> index 00b1cdf..4eac666 100644 >>> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic.h >>> @@ -61,6 +61,12 @@ extern "C" { >>> >>> #define rte_smp_rmb() rte_compiler_barrier() >>> >>> +#define rte_io_mb() rte_mb() >>> + >>> +#define rte_io_wmb() rte_compiler_barrier() >>> + >>> +#define rte_io_rmb() rte_compiler_barrier() >>> + >>> /*------------------------- 16 bit atomic operations >>> * -------------------------*/ >>> >>> #ifndef RTE_FORCE_INTRINSICS >>> >>>> >>>> If so, does it make sense to override these functions for x86, and make >>>> rte_writeX = rte_writeX_relaxed >>>> rte_readX = rte_readX_relaxed >>>> >>>>> >>>>> /* CSR read macro */ >>>>> -#define ADF_CSR_RD(csrAddr, csrOffset) \ >>>>> - (*((volatile uint32_t *)(((uint8_t *)csrAddr) + csrOffset))) >>>>> +#define ADF_CSR_RD(csrAddr, csrOffset) \ >>>>> + rte_read32((((uint8_t *)csrAddr) + csrOffset)) >>>> >>>> This patchset both introduces new rte_readX/rte_writeX functions, also >>>> applies them into drivers. >>>> >>>> While applying them, it changes the behavior. >>>> Like above code was doing a read, but after update it does read and >>>> read_memory_barrier. >>>> >>>> What do you think this patchset updates usage in a manner that keeps >>>> behavior exact same. Like using rte_read32_relaxed for this case. >>>> And doing architecture related updates in a different patchset? >>> >>> Need to use rte_read32 at this commit otherwise it will break for ARM. >>> That's was all point for this patchset. >> >> Why it breaks the ARM, is it because rte_*mb() updated for ARM in this >> patchset (patch 7/29) ? > > Yes. > > >> >> I believe it is good to make these modifications in two phase: > > It is in two phases only. First introduced the API with implementation and > enabled in each driver. Why did you think other-way around it is better? For two things: 1- If something goes wrong, find the source of problem easier. 2- Make architectural changes obvious, right now it is a little hard to see, and this again for item 1. But I also would like to hear more comments before you change/try anything. > I can rework and test if there is any value addition. If you concerned > about git bisect ability then I don't think we are loosing that in this > model. > > Thoughts? > >> - First replace old usage with rte_readX/rte_writeX while keeping exact >> same behavior >> >> - Second, do architecture specific changes. Both in eal and drivers >> level if required. >> >> Thanks, >> ferruh >> >>> For performance regression, we can always verify by taking delta >>> between this changeset and the previous changeset. If you think, I need >>> to make rte_io_wmb()/rte_io_rmb() as empty for IA then I could do that >>> as well. >>> >>> >>>> >>>> This both makes easy to see architecture specific updates, and makes >>>> easy to trace any possible performance issues by this patchset. >>>> >>>>> >>>>> #define ADF_BANK_INT_SRC_SEL_MASK_0 0x4444444CUL >>>>> #define ADF_BANK_INT_SRC_SEL_MASK_X 0x44444444UL >>>>> >>>> >>