From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 666D48E5E for ; Wed, 13 Jan 2016 16:18:18 +0100 (CET) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga101.jf.intel.com with ESMTP; 13 Jan 2016 07:17:37 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,289,1449561600"; d="scan'208";a="889720069" Received: from orsmsx109.amr.corp.intel.com ([10.22.240.7]) by orsmga002.jf.intel.com with ESMTP; 13 Jan 2016 07:17:37 -0800 Received: from orsmsx115.amr.corp.intel.com (10.22.240.11) by ORSMSX109.amr.corp.intel.com (10.22.240.7) with Microsoft SMTP Server (TLS) id 14.3.248.2; Wed, 13 Jan 2016 07:17:37 -0800 Received: from orsmsx102.amr.corp.intel.com ([169.254.3.45]) by ORSMSX115.amr.corp.intel.com ([10.22.240.11]) with mapi id 14.03.0248.002; Wed, 13 Jan 2016 07:17:36 -0800 From: "Polehn, Mike A" To: "Richardson, Bruce" , Moon-Sang Lee Thread-Topic: [dpdk-dev] rte_prefetch0() is effective? Thread-Index: AQHRTfZ2mWIoEmtgS0yShRPPR+hBQp75gB9w Date: Wed, 13 Jan 2016 15:17:36 +0000 Message-ID: <745DB4B8861F8E4B9849C970520ABBF149847573@ORSMSX102.amr.corp.intel.com> References: <20160113113432.GA7216@bricha3-MOBL3> In-Reply-To: <20160113113432.GA7216@bricha3-MOBL3> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiYWU3MGIzYWItMmFkNC00NzgzLWJkYjItMjAwMzJiN2FmYzgyIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjQuMTAuMTkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiejhYdE13eFU1ZDF1Y0xTb1daT1ZyVXp0cWZHSkpHcmwrTmlrcEdNOUIwcz0ifQ== x-ctpclassification: CTP_IC x-originating-ip: [10.22.254.139] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] rte_prefetch0() is effective? X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2016 15:18:19 -0000 Prefetchs make a big difference because a powerful CPU like IA is always tr= ying to find items to prefetch and the priority of these is not always easy= to determine. This is especially a problem across subroutine calls since t= he compiler cannot determine what is of priority in the other subroutines a= nd the runtime CPU logic cannot always have the future well predicted far e= nough in the future for all possible paths, especially if you have a cache = miss, which takes eons of clock cycles to do a memory access probably resul= ting in a CPU stall. Until we get to the point of the computers full understanding the logic of = the program and writing optimum code (putting programmers out of business) = , the understanding of what is important as the program progresses gives th= e programmer knowledge of what is desirable to prefetch. It is difficult to= determine if the CPU is going to have the same priority of the prefetch, s= o having a prefetch may or may not show up as a measureable performance imp= rovement under some conditions, but having the prefetch decision in place c= an make prefetch priority decision correct in these other cases, which make= a performance improvement. Removing a prefetch without thinking through and fully understanding the lo= gic of why it is there, or what he added cost (in the case of calculating a= n address for the prefetch that affects other current operations) if any, i= s just plain amateur work. It is not to say people do not make bad judgme= nts on what needs to be prefetched and put poor prefetch placement and shou= ld only be removed if not logically proper for expected runtime operation. Only more primitive CPUs with no prefetch capabilities don't benefit from p= roperly placed prefetches.=20 Mike -----Original Message----- From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson Sent: Wednesday, January 13, 2016 3:35 AM To: Moon-Sang Lee Cc: dev@dpdk.org Subject: Re: [dpdk-dev] rte_prefetch0() is effective? On Thu, Dec 24, 2015 at 03:35:14PM +0900, Moon-Sang Lee wrote: > I see codes as below in example directory, and I wonder it is effective. > Coherent IO is adopted to modern architectures, so I think that DMA=20 > initiation by rte_eth_rx_burst() might already fulfills cache lines of=20 > RX buffers. > Do I really need to call rte_prefetchX()? >=20 > nb_rx =3D rte_eth_rx_burst(portid, queueid, pkts_burst,=20 > MAX_PKT_BURST); > ... > /* Prefetch and forward already prefetched packets */ > for (j =3D 0; j < (nb_rx - PREFETCH_OFFSET); j++) { > rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[ > j + PREFETCH_OFFSET], void *)); > l3fwd_simple_forward(pkts_burst[j], portid, > qconf); > } >=20 Good question. When the first example apps using this style of prefetch were originally wr= itten, yes, there was a noticable performance increase achieved by using th= e prefetch. Thereafter, I'm not sure that anyone has checked with each generation of pl= atforms whether the prefetches are still necessary and how much they help, = but I suspect that they still help a bit, and don't hurt performance. It would be an interesting exercise to check whether the prefetch offsets u= sed in code like above can be adjusted to give better performance on our la= test supported platforms. /Bruce