From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.mhcomputing.net (master.mhcomputing.net [74.208.228.170]) by dpdk.org (Postfix) with ESMTP id 3EA8F8DAD for ; Wed, 13 Jan 2016 18:29:47 +0100 (CET) Received: by mail.mhcomputing.net (Postfix, from userid 1000) id 949A130E; Wed, 13 Jan 2016 12:29:46 -0500 (EST) Date: Wed, 13 Jan 2016 12:29:46 -0500 From: Matthew Hall To: Bruce Richardson Message-ID: <20160113172946.GA9514@mhcomputing.net> References: <20160113113432.GA7216@bricha3-MOBL3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160113113432.GA7216@bricha3-MOBL3> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: dev@dpdk.org Subject: Re: [dpdk-dev] rte_prefetch0() is effective? X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 13 Jan 2016 17:29:47 -0000 On Wed, Jan 13, 2016 at 11:34:33AM +0000, Bruce Richardson wrote: > When the first example apps using this style of prefetch were originally > written, yes, there was a noticable performance increase achieved by using > the prefetch. Thereafter, I'm not sure that anyone has checked with each > generation of platforms whether the prefetches are still necessary and how > much they help, but I suspect that they still help a bit, and don't hurt > performance. FYI, for me as a community member this paragraph describes one of my top irritations about DPDK. The Intel accelerations, such as adding prefetches, or support for new features like the librte_power, are treated as one-off projects not as ongoing technical efforts which need periodic retesting and maintenance. Thus it turned out that after waiting over a month for a reply, I eventually discovered librte_power probably never worked right at all since at least Sandy Bridge, which is a very old chip by now for servers. The accelerations are also treated like black magic. Meaning no comments are put in the code about how and why they work, so an outsider trying his best to measure things in VTune to help provide the ongoing testing and maintenance, can not tell why something was done or how it might be adjusted to work right in their environment if their hardware is older or newer than whatever undocumented hardware was used in developing the example. There's nowhere I know of that says the reference platform and core generation used for developing an example either so I could get some idea if it's current or old code. When I ask a high level question, such as "Which Intel accelerations should one make sure are enabled to get best performance?" it normally doesn't get any reply. This makes life difficult because there are many dozen accelerations listed in the data sheet of a typical modern Intel core and no guidance is provided on the priority of the different accelerations for DPDK. So I don't have a good idea about where to focus my time to get the best acceleration out of all the technology it must have cost Intel millions or billions to create. To me that's very sad. I am hoping maybe there are some resources we could make available to help understand the principles behind the accelerations so it is easier for the community to take part in maintaining them and maybe even helping create new ones. Note: I read through all the subchapters here: http://dpdk.org/doc/guides/prog_guide/perf_opt_guidelines.html None of them mention any CPU acceleration details whatsoever. They don't explain any specifics on prefetch or branch prediction. Only that they exist and do things. Sincerely, Matthew.