From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id F30CFA051C; Mon, 10 Feb 2020 18:53:37 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id EB65149E0; Mon, 10 Feb 2020 18:53:36 +0100 (CET) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by dpdk.org (Postfix) with ESMTP id CAF9B397D for ; Mon, 10 Feb 2020 18:53:35 +0100 (CET) Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 01AHoeGd138569 for ; Mon, 10 Feb 2020 12:53:34 -0500 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 2y1tn513s3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Mon, 10 Feb 2020 12:53:34 -0500 Received: from m0098393.ppops.net (m0098393.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 01AHq8eO143318 for ; Mon, 10 Feb 2020 12:53:34 -0500 Received: from ppma03dal.us.ibm.com (b.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.11]) by mx0a-001b2d01.pphosted.com with ESMTP id 2y1tn513rr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 10 Feb 2020 12:53:34 -0500 Received: from pps.filterd (ppma03dal.us.ibm.com [127.0.0.1]) by ppma03dal.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id 01AHp5Gw010749; Mon, 10 Feb 2020 17:53:33 GMT Received: from b01cxnp23032.gho.pok.ibm.com (b01cxnp23032.gho.pok.ibm.com [9.57.198.27]) by ppma03dal.us.ibm.com with ESMTP id 2y1mm6hqf7-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 10 Feb 2020 17:53:33 +0000 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp23032.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 01AHrWCp32833926 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 10 Feb 2020 17:53:32 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 46AA8B2064; Mon, 10 Feb 2020 17:53:32 +0000 (GMT) Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 111ACB206B; Mon, 10 Feb 2020 17:53:32 +0000 (GMT) Received: from [9.41.98.98] (unknown [9.41.98.98]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP; Mon, 10 Feb 2020 17:53:31 +0000 (GMT) To: David Marchand Cc: dev , David Christensen References: <20200128210233.691-1-thinhtr@linux.vnet.ibm.com> <20200131220336.103874-1-thinhtr@linux.vnet.ibm.com> From: Thinh Tran Message-ID: Date: Mon, 10 Feb 2020 11:53:32 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138, 18.0.572 definitions=2020-02-10_06:2020-02-10, 2020-02-10 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 mlxlogscore=973 suspectscore=0 bulkscore=0 clxscore=1015 impostorscore=0 lowpriorityscore=0 adultscore=0 priorityscore=1501 phishscore=0 mlxscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2002100133 Subject: Re: [dpdk-dev] [PATCH v2] eal/ppc64: improve rte_rdtsc with ppc_get_timebase X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, Sorry for late response. Yes this is the enhancement for powerpc. Observations on our power8/9 the __ppc_get_timebase calls __builtin_ppc_get_timebase () which is result in calling the mftb instruction __ppc_get_timebase(): mftb rA this instruction on a 64-bit implementation copies the entire time base (TBU||TBL) into rA, which also reduces number of cycles significantly comparing to the current code (same as last block) Take the simple reciprocal division perf test on power9 that heavily calls rte_rdtsc() to demonstrate: - without this batch: Validating unsigned 32bit division. 32bit Division results: Total number of cycles normal division : 73744549935 Total number of cycles reciprocal division : 76954877143 Cycles per division(normal) : 17.17 Cycles per division(reciprocal) : 17.92 Validating unsigned 64bit division. 64bit Division results: Total number of cycles normal division : 73932937051 Total number of cycles reciprocal division : 74598584339 Cycles per division(normal) : 17.21 Cycles per division(reciprocal) : 17.37 Validating unsigned 64bit division with 32bit divisor. 64bit Division results: Total number of cycles normal division : 78660556171 Total number of cycles reciprocal division : 74566630579 Cycles per division(normal) : 18.31 Cycles per division(reciprocal) : 17.36 Validating division by power of 2. 64bit Division results: Total number of cycles normal division : 1097 Total number of cycles reciprocal division : 1201 Cycles per division(normal) : 17.14 Cycles per division(reciprocal) : 18.77 Test OK RTE>> - with the patch: Validating unsigned 32bit division. 32bit Division results: Total number of cycles normal division : 41690214596 Total number of cycles reciprocal division : 44446377795 Cycles per division(normal) : 9.71 Cycles per division(reciprocal) : 10.35 Validating unsigned 64bit division. 64bit Division results: Total number of cycles normal division : 41687737031 Total number of cycles reciprocal division : 41666358052 Cycles per division(normal) : 9.71 Cycles per division(reciprocal) : 9.70 Validating unsigned 64bit division with 32bit divisor. 64bit Division results: Total number of cycles normal division : 46386969228 Total number of cycles reciprocal division : 41663680498 Cycles per division(normal) : 10.80 Cycles per division(reciprocal) : 9.70 Validating division by power of 2. 64bit Division results: Total number of cycles normal division : 618 Total number of cycles reciprocal division : 618 Cycles per division(normal) : 9.66 Cycles per division(reciprocal) : 9.66 Test OK RTE>> I hope this explains it. Thanks, Thinh Tran On 2/5/2020 3:29 PM, David Marchand wrote: > On Fri, Jan 31, 2020 at 11:04 PM Thinh Tran wrote: >> >> __ppc_get_timebase() is GNU extension and is more efficient > > The commit title and log are quite short and give little idea on what > this is about. > > > I had a look at this glibc helper: > > /* Read the Time Base Register. */ > static __inline__ uint64_t > __ppc_get_timebase (void) > { > #if __GNUC_PREREQ (4, 8) > return __builtin_ppc_get_timebase (); > #else > # ifdef __powerpc64__ > uint64_t __tb; > /* "volatile" is necessary here, because the user expects this assembly > isn't moved after an optimization. */ > __asm__ volatile ("mfspr %0, 268" : "=r" (__tb)); > return __tb; > # else /* not __powerpc64__ */ > uint32_t __tbu, __tbl, __tmp; \ > __asm__ volatile ("0:\n\t" > "mftbu %0\n\t" > "mftbl %1\n\t" > "mftbu %2\n\t" > "cmpw %0, %2\n\t" > "bne- 0b" > : "=r" (__tbu), "=r" (__tbl), "=r" (__tmp)); > return (((uint64_t) __tbu << 32) | __tbl); > # endif /* not __powerpc64__ */ > #endif > } > > The last block is exactly the code we had in dpdk. > So I suppose we are trying to use mfspr for register 268 which seems > linked to timebase (looking at the linux kernel sources). > > Please, confirm this is an enhancement (and how this improves current > ppc support). > Thanks. > >