From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id EA18BB368 for ; Fri, 1 Aug 2014 23:08:12 +0200 (CEST) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga103.fm.intel.com with ESMTP; 01 Aug 2014 14:02:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.01,782,1400050800"; d="scan'208";a="570622653" Received: from unknown (HELO localhost.localdomain) ([134.134.172.151]) by fmsmga001.fm.intel.com with ESMTP; 01 Aug 2014 14:09:40 -0700 Date: Fri, 1 Aug 2014 14:08:22 -0700 From: Bruce Richardson To: Neil Horman Message-ID: <20140801210821.GF28495@localhost.localdomain> References: <20140731131351.GA20718@hmsreliant.think-freely.org> <5766264.li3nkTmgY6@xps13> <20140731143228.GB20718@hmsreliant.think-freely.org> <20140731181032.GC20718@hmsreliant.think-freely.org> <20140731183631.GC6420@localhost.localdomain> <20140731201018.GE20718@hmsreliant.think-freely.org> <20140731202506.GC28495@localhost.localdomain> <20140801150629.GD31979@hmsreliant.think-freely.org> <20140801192221.GE28495@localhost.localdomain> <20140801204352.GF31979@hmsreliant.think-freely.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20140801204352.GF31979@hmsreliant.think-freely.org> Organization: Intel Shannon Limited. Registered in Ireland. Registered Office: Collinstown Industrial Park, Leixlip, County Kildare. Registered Number: 308263. Business address: Dromore House, East Park, Shannon, Co. Clare. User-Agent: Mutt/1.5.21 (2010-09-15) Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of some isolated features X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Aug 2014 21:08:14 -0000 On Fri, Aug 01, 2014 at 04:43:52PM -0400, Neil Horman wrote: > On Fri, Aug 01, 2014 at 12:22:22PM -0700, Bruce Richardson wrote: > > On Fri, Aug 01, 2014 at 11:06:29AM -0400, Neil Horman wrote: > > > On Thu, Jul 31, 2014 at 01:25:06PM -0700, Bruce Richardson wrote: > > > > On Thu, Jul 31, 2014 at 04:10:18PM -0400, Neil Horman wrote: > > > > > On Thu, Jul 31, 2014 at 11:36:32AM -0700, Bruce Richardson wrote: > > > > > > Thu, Jul 31, 2014 at 02:10:32PM -0400, Neil Horman wrote: > > > > > > > On Thu, Jul 31, 2014 at 10:32:28AM -0400, Neil Horman wrote: > > > > > > > > On Thu, Jul 31, 2014 at 03:26:45PM +0200, Thomas Monjalon wrote: > > > > > > > > > 2014-07-31 09:13, Neil Horman: > > > > > > > > > > On Wed, Jul 30, 2014 at 02:09:20PM -0700, Bruce Richardson wrote: > > > > > > > > > > > On Wed, Jul 30, 2014 at 03:28:44PM -0400, Neil Horman wrote: > > > > > > > > > > > > On Wed, Jul 30, 2014 at 11:59:03AM -0700, Bruce Richardson wrote: > > > > > > > > > > > > > On Tue, Jul 29, 2014 at 04:24:24PM -0400, Neil Horman wrote: > > > > > > > > > > > > > > Hey all- > > > > > > > > > > > > With regards to the general approach for runtime detection of software > > > > > > functions, I wonder if something like this can be handled by the > > > > > > packaging system? Is it possible to ship out a set of shared libs > > > > > > compiled up for different instruction sets, and then at rpm install > > > > > > time, symlink the appropriate library? This would push the whole issue > > > > > > of detection of code paths outside of code, work across all our > > > > > > libraries and ensure each user got the best performance they could get > > > > > > form a binary? > > > > > > Has something like this been done before? The building of all the > > > > > > libraries could be scripted easy enough, just do multiple builds using > > > > > > different EXTRA_CFLAGS each time, and move and rename the .so's after > > > > > > each run. > > > > > > > > > > > > > > > > Sorry, I missed this in my last reply. > > > > > > > > > > In answer to your question, the short version is that such a thing is roughly > > > > > possible from a packaging standpoint, but completely unworkable from a > > > > > distribution standpoint. We could certainly build the dpdk multiple times and > > > > > rename all the shared objects to some variant name representative of the > > > > > optimzations we build in for certain cpu flags, but then we woudl be shipping X > > > > > versions of the dpdk, and any appilcation (say OVS that made use of the dpdk > > > > > would need to provide a version linked against each variant to be useful when > > > > > making a product, and each end user would need to manually select (or run a > > > > > script to select) which variant is most optimized for the system at hand. Its > > > > > just not a reasonable way to package a library. > > > > > > > > Sorry, perhaps I was not clear, having the user have to select the > > > > appropriate library was not what I was suggesting. Instead, I was > > > > suggesting that the rpm install "librte_pmd_ixgbe.so.generic", > > > > "librte_pmd_ixgbe.so.sse42" and "librte_pmd_ixgbe.so.avx". Then the rpm > > > > post-install script would look at the cpuflags in cpuinfo and then > > > > symlink librte_pmd_ixgbe.so to the best-match version. That way the user > > > > only has to link against "librte_pmd_ixgbe.so" and depending on the > > > > system its run on, the loader will automatically resolve the symbols > > > > from the appropriate instruction-set specific .so file. > > > > > > > > > > This is an absolute packaging nightmare, it will potentially break all sorts of > > > corner cases, and support processes. To cite a few examples: > > > > > > 1) Upgrade support - What if the minimum cpu requirements for dpdk are advanced > > > at some point in the future? The above strategy has no way to know that a given > > > update has more advanced requirements than a previous update, and when the > > > update is installed, the previously linked library for the old base will > > > dissappear, leaving broken applications behind. > > > > Firstly, I didn't know we could actually specify minimum cpu > > requirements for packaging, that is something that could be useful :-) > You misread my comment :). I didn't say we could specify minimum cpu > requirements at packaging (you can't, beyond general arch), I said "what if the > dpdk's cpu requriements were raised?". Completely different thing. Currently > teh default, lowest common denominator system that dpdk appears to build for is > core2 (as listed in the old default config). What if at some point you raise > those requirements and decide that SSE4.2 really is required to achieve maximum > performance. Using the above strategy any system that doesn't meet the new > requirements will silently break on such an update. Thats not acceptable. Core2 was the first Intel set of chips that had x86_64 instruction set before (Core microarchitecture), so that's why it's listed as the minimum - it's the same thing as generic x86_64 support. :-) > > > Secondly, what is the normal case for handling something like this, > > where an upgrade has enhanced requirements compared to the previous > > version? Presumably you either need to prevent the upgrade from > > happening or else accept a broken app. Can the same mechanism not also > > be used to prevent upgrades using a multi-lib scheme? > > > The case for handling something like this is: Don't do it. When you package > something for Fedora (or any distro), you provide an implicit guaratee that it > will run (or fail gracefully) on all supported systems. You can add support for > systems as you go forward, but you can't deprecate support for systems within a > major release. That is to say, if something runs on F20 now, its got to keep > running on F20 for the lifetime of F20. If it stops running, thats a > regression, the user opens a bug and you fix it. > > The DPDK is way off the reservation in regards to this. Application packages, > as a general rule don't build with specific cpu features in mind, because > performance, while important isn't on the same scale as what you're trying to do > in the dpdk. A process getting scheduled off the cpu while we handle an > interrupt wipes out any speedup gains made by any micro-optimizations, so theres > no point in doing so. The DPDK is different, I understand that, but the > drawback is that it (the DPDK) needs to make optimizations that really aren't > considered particularly important to the rest of user space. I'm trying to > opportunistically make the DPDK as fast as possible, but I need to do it in a > single binary, that works on a lowest common demoninator system. > > > > > > > 2) Debugging - Its going to be near impossible to support an application built > > > with a package put together this way, because you'll never be sure as to which > > > version of the library was running when the crash occured. You can figure it > > > out for certain, but for support/development people to need to remember to > > > figure this out is going to be a major turn off for them, and the result will be > > > that they simply won't use the dpdk. Its Anathema to the expectations of linux > > > user space. > > > > Sorry, I just don't see this as being any harder to support than > > multiple code paths for the same functionality. In fact, it will surely make > > debugging easier, since you only have the one code path, just compiled > > up in different ways. > > > > Well, then by all means, become a Fedora packager, an you can take over the DPDK > maintenece there :). Until then, you'll just have to trust me. If you have > multiple optional code paths (especialy if they're limited to isolated features) > its manageable. But regardless of how you look at it, building the same > source multiple times with different cpu support means completely different > binaries. The assembly and optimization are just plain different. They may be > close, but they're not the same, and they need to be QA-ed independently. With > a single build and optional code paths, all the common code is executed no > matter what system you're running on, and its always the same. Multiple builds > with different instruction support means that code that is identical at a source > level may well be significantly different at a binary level, and thats not > something I can sanely manage in a general purpose environment. > > > > > > 3) QA - Building multiple versions of a library means needing to QA multiple > > > versions of a library. If you have to have 4 builds to support different levels > > > of optimization, you've created a 4x increase in the amount of testing you need > > > to do to ensure consistent behavior. You need to be aware of how many different > > > builds are available in the single rpm at all times, and find systems on which > > > to QA which will ensure that all of the builds get tested (as they are in fact, > > > unique builds). While you may not hit all code paths in a single build, you > > > will at least test all the common paths. > > > > Again, the exact same QA conditions will also apply to an approach using > > multiple code paths bundled into the same library. Given a choice > > between one code path with multiple compiles, vs multiple code paths > > each compiled only once, the multiple code paths option leaves far > > greater scope for bugs, and when bugs do occur means that you always > > have to find out what specific hardware it was being run on. Using the > > exact same code multiply compiled, the vast, vast majority of bugs are > > going to occur across all platforms and systems so you should rarely > > need to ask what the specific platform being used is. > > > > No, they won't (see above). Enabling insructions will enable the compiler to > emit and optimize common paths differently, so identical source code will lead > to different binary code. I need to have a single binary so that I know what > I'm working with when someone opens a bug. I don't have that using a multiple > binary approach. At least with multiple runtime paths (especially/specifically > with the run time paths we've been discussing, the igbe rx vector path and the > acl library, which are isolated), I know that, if I get a bug report and the > backtrace ends in either location, I know I'm sepecifically dealing with that > code. With your multiple binary approach, if I get a crash in, say > rte_eal_init, I need to figure out if this crash happened in the sse3 compiled > binary, the ss4.2 compiled binary, the avx binary, the avx512 binary, or the > core2 binary. You can say thats easy, but its easy to say that when you're not > the one that has to support it. > > > > > > > The bottom line is that Distribution packaging is all about consistency and > > > commonality. If you install something for an arch on multiple systems, its the > > > same thing on each system, and it works in the same way, all the time. This > > > strategy breaks that. Thats why we do run time checks for things. > > > > If you want to have the best tuned code running for each instruction > > set, then commonality and consistency goes out the window anyway, > So, this is perhaps where communication is breaking down. I don't want to have the > best tuned code running for each instruction set. What I want is for the dpdk > to run on a lowest common denominator platform, and be able to opportunistically > take advantage of accelerated code paths that require advanced cpu featrues. > > > Lets take the ixgbe code as an example. Note I didn't add any code paths there, > at all (in fact I didn't add any anywhere). The ixgbe rx_burst method gets set > according to compile time configuration. You can pick the bulk_alloc rx method, > or the vectorized rx method at compile time (or some others I think, but thats > not relevant). As it happened the vectorized rx path option had an implicit > dependency on SSE4.2. Instead of requiring that all cpus that run the dpdk have > SSE4.2, I instead chose to move that compile time decision to a run tmie > decision, by building only the vectorized path with sse4.2 and only using it if > we see that the cpu supports sse4.2 at run time. No new paths created, no new > support requirements, you're still supporting the same options upstream, the > only difference is I was able to include them both in a single binary. Thats > better for our end users because the single binary still works everywhere. > Thats better for our QA group because For whatever set of tests they perform, > they only need an sse4.2 enabled system to test the one isolated path for that > vector rx code. The rest of their tests can be conducted once, on any system, > because the binary is exactly the same. If we compile multiple binaries, > testing on one system doesn't mean we've tested all the code. > > > because two different machines calling the same function are going to > > execute different sets of instructions. The decision then becomes: > But thats not at all what I wanted. I want two different machines calling the > same function to execute the same instructions 99.9999% of the time. The only > time I want to diverge from that is in isolated paths where we can take > advantage of a feature that we otherwise could not (i.e. the ixgbe and acl > code). I look at it like the alternatives code in linux. There are these > isolated areas where you have limited bits of code that at run time are > re-written to use available cpu features. 99.9% of the code is identical, but > in these little spots its ok to diverge from simmilarity because they'er isolated, > easily identifiable > > > a) whether you need multiple sets of instructions - if no then you pay > > with lack of performance > > b) how you get those multiple sets of instructions > > c) how you validate those multiple sets of instructions. > > > > As is clear by now :-), my preference by far is to have multiple sets of > > instructions come from a single code base, as less code means less > > maintenance, and above all, fewer bugs. If that can't be done, then we > > need to look carefully at each code path being added and do a > > cost-benefit analysis on it. > > > > Yes, its quite clear :), I think its equally clear that I need a single binary, > and would like to opportunisitcally enhance it where possible without losing the > fact that its a single binary. > > I suppose its all somewhat moot at this point though, The reduction to sse3 for > ixgbe seems agreeable to everyone, and it lets me preserve single binary builds > there. I'm currently working on the ACL library, as you noted thats a tougher > nut to crack. I think I'll have it done early next week (though i'm sure my > translation of the instruction set reference to C will need some through testing > :)). I'll post it when its ready. > Agreed, lets get everything working to a common baseline anyway. In terms of the number of RX and TX functions, you mentioned, I'd hope in future we could cut the number of them down a bit as we make the vector versions more generally applicable, but that's a whole discussion for another day. /Bruce