From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bruce.richardson@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id EA18BB368
 for <dev@dpdk.org>; Fri,  1 Aug 2014 23:08:12 +0200 (CEST)
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga103.fm.intel.com with ESMTP; 01 Aug 2014 14:02:58 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.01,782,1400050800"; d="scan'208";a="570622653"
Received: from unknown (HELO localhost.localdomain) ([134.134.172.151])
 by fmsmga001.fm.intel.com with ESMTP; 01 Aug 2014 14:09:40 -0700
Date: Fri, 1 Aug 2014 14:08:22 -0700
From: Bruce Richardson <bruce.richardson@intel.com>
To: Neil Horman <nhorman@tuxdriver.com>
Message-ID: <20140801210821.GF28495@localhost.localdomain>
References: <20140731131351.GA20718@hmsreliant.think-freely.org>
 <5766264.li3nkTmgY6@xps13>
 <20140731143228.GB20718@hmsreliant.think-freely.org>
 <20140731181032.GC20718@hmsreliant.think-freely.org>
 <20140731183631.GC6420@localhost.localdomain>
 <20140731201018.GE20718@hmsreliant.think-freely.org>
 <20140731202506.GC28495@localhost.localdomain>
 <20140801150629.GD31979@hmsreliant.think-freely.org>
 <20140801192221.GE28495@localhost.localdomain>
 <20140801204352.GF31979@hmsreliant.think-freely.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20140801204352.GF31979@hmsreliant.think-freely.org>
Organization: Intel Shannon Limited. Registered in Ireland.  Registered
 Office: Collinstown Industrial Park, Leixlip, County Kildare. Registered
 Number: 308263. Business address: Dromore House, East Park, Shannon, Co.
 Clare.
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH 0/2] dpdk: Allow for dynamic enablement of
 some isolated features
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 01 Aug 2014 21:08:14 -0000

On Fri, Aug 01, 2014 at 04:43:52PM -0400, Neil Horman wrote:
> On Fri, Aug 01, 2014 at 12:22:22PM -0700, Bruce Richardson wrote:
> > On Fri, Aug 01, 2014 at 11:06:29AM -0400, Neil Horman wrote:
> > > On Thu, Jul 31, 2014 at 01:25:06PM -0700, Bruce Richardson wrote:
> > > > On Thu, Jul 31, 2014 at 04:10:18PM -0400, Neil Horman wrote:
> > > > > On Thu, Jul 31, 2014 at 11:36:32AM -0700, Bruce Richardson wrote:
> > > > > > Thu, Jul 31, 2014 at 02:10:32PM -0400, Neil Horman wrote:
> > > > > > > On Thu, Jul 31, 2014 at 10:32:28AM -0400, Neil Horman wrote:
> > > > > > > > On Thu, Jul 31, 2014 at 03:26:45PM +0200, Thomas Monjalon wrote:
> > > > > > > > > 2014-07-31 09:13, Neil Horman:
> > > > > > > > > > On Wed, Jul 30, 2014 at 02:09:20PM -0700, Bruce Richardson wrote:
> > > > > > > > > > > On Wed, Jul 30, 2014 at 03:28:44PM -0400, Neil Horman wrote:
> > > > > > > > > > > > On Wed, Jul 30, 2014 at 11:59:03AM -0700, Bruce Richardson wrote:
> > > > > > > > > > > > > On Tue, Jul 29, 2014 at 04:24:24PM -0400, Neil Horman wrote:
> > > > > > > > > > > > > > Hey all-
> > > > > > 
> > > > > > With regards to the general approach for runtime detection of software
> > > > > > functions, I wonder if something like this can be handled by the
> > > > > > packaging system? Is it possible to ship out a set of shared libs
> > > > > > compiled up for different instruction sets, and then at rpm install
> > > > > > time, symlink the appropriate library? This would push the whole issue
> > > > > > of detection of code paths outside of code, work across all our
> > > > > > libraries and ensure each user got the best performance they could get
> > > > > > form a binary?
> > > > > > Has something like this been done before? The building of all the
> > > > > > libraries could be scripted easy enough, just do multiple builds using
> > > > > > different EXTRA_CFLAGS each time, and move and rename the .so's after
> > > > > > each run.
> > > > > > 
> > > > > 
> > > > > Sorry, I missed this in my last reply.
> > > > > 
> > > > > In answer to your question, the short version is that such a thing is roughly
> > > > > possible from a packaging standpoint, but completely unworkable from a
> > > > > distribution standpoint.  We could certainly build the dpdk multiple times and
> > > > > rename all the shared objects to some variant name representative of the
> > > > > optimzations we build in for certain cpu flags, but then we woudl be shipping X
> > > > > versions of the dpdk, and any appilcation (say OVS that made use of the dpdk
> > > > > would need to provide a version linked against each variant to be useful when
> > > > > making a product, and each end user would need to manually select (or run a
> > > > > script to select) which variant is most optimized for the system at hand.  Its
> > > > > just not a reasonable way to package a library.
> > > > 
> > > > Sorry, perhaps I was not clear, having the user have to select the
> > > > appropriate library was not what I was suggesting. Instead, I was
> > > > suggesting that the rpm install "librte_pmd_ixgbe.so.generic",
> > > > "librte_pmd_ixgbe.so.sse42" and "librte_pmd_ixgbe.so.avx". Then the rpm
> > > > post-install script would look at the cpuflags in cpuinfo and then
> > > > symlink librte_pmd_ixgbe.so to the best-match version. That way the user
> > > > only has to link against "librte_pmd_ixgbe.so" and depending on the
> > > > system its run on, the loader will automatically resolve the symbols
> > > > from the appropriate instruction-set specific .so file.
> > > > 
> > > 
> > > This is an absolute packaging nightmare, it will potentially break all sorts of
> > > corner cases, and support processes.  To cite a few examples:
> > > 
> > > 1) Upgrade support - What if the minimum cpu requirements for dpdk are advanced
> > > at some point in the future?  The above strategy has no way to know that a given
> > > update has more advanced requirements than a previous update, and when the
> > > update is installed, the previously linked library for the old base will
> > > dissappear, leaving broken applications behind.
> > 
> > Firstly, I didn't know we could actually specify minimum cpu
> > requirements for packaging, that is something that could be useful :-)
> You misread my comment :).  I didn't say we could specify minimum cpu
> requirements at packaging (you can't, beyond general arch), I said "what if the
> dpdk's cpu requriements were raised?".  Completely different thing.  Currently
> teh default, lowest common denominator system that dpdk appears to build for is
> core2 (as listed in the old default config).  What if at some point you raise
> those requirements and decide that SSE4.2 really is required to achieve maximum
> performance. Using the above strategy any system that doesn't meet the new
> requirements will silently break on such an update.  Thats not acceptable.

Core2 was the first Intel set of chips that had x86_64 instruction set
before (Core microarchitecture), so that's why it's listed as the
minimum - it's the same thing as generic x86_64 support. :-)

> 
> > Secondly, what is the normal case for handling something like this,
> > where an upgrade has enhanced requirements compared to the previous
> > version? Presumably you either need to prevent the upgrade from
> > happening or else accept a broken app. Can the same mechanism not also
> > be used to prevent upgrades using a multi-lib scheme?
> > 
> The case for handling something like this is: Don't do it.  When you package
> something for Fedora (or any distro), you provide an implicit guaratee that it
> will run (or fail gracefully) on all supported systems.  You can add support for
> systems as you go forward, but you can't deprecate support for systems within a
> major release.  That is to say, if something runs on F20 now, its got to keep
> running on F20 for the lifetime of F20.  If it stops running, thats a
> regression, the user opens a bug and you fix it.
> 
> The DPDK is way off the reservation in regards to this.  Application packages,
> as a general rule don't build with specific cpu features in mind, because
> performance, while important isn't on the same scale as what you're trying to do
> in the dpdk.  A process getting scheduled off the cpu while we handle an
> interrupt wipes out any speedup gains made by any micro-optimizations, so theres
> no point in doing so.  The DPDK is different, I understand that, but the
> drawback is that it (the DPDK) needs to make optimizations that really aren't
> considered particularly important to the rest of user space.  I'm trying to
> opportunistically make the DPDK as fast as possible, but I need to do it in a
> single binary, that works on a lowest common demoninator system.
> 
> > > 
> > > 2) Debugging - Its going to be near impossible to support an application built
> > > with a package put together this way, because you'll never be sure as to which
> > > version of the library was running when the crash occured.  You can figure it
> > > out for certain, but for support/development people to need to remember to
> > > figure this out is going to be a major turn off for them, and the result will be
> > > that they simply won't use the dpdk.  Its Anathema to the expectations of linux
> > > user space.
> > 
> > Sorry, I just don't see this as being any harder to support than
> > multiple code paths for the same functionality. In fact, it will surely make
> > debugging easier, since you only have the one code path, just compiled
> > up in different ways.
> > 
> 
> Well, then by all means, become a Fedora packager, an you can take over the DPDK
> maintenece there :).  Until then, you'll just have to trust me.  If you have
> multiple optional code paths (especialy if they're limited to isolated features) 
> its manageable.  But regardless of how you look at it, building the same
> source multiple times with different cpu support means completely different
> binaries.  The assembly and optimization are just plain different.  They may be
> close, but they're not the same, and they need to be QA-ed independently.  With
> a single build and optional code paths, all the common code is executed no
> matter what system you're running on, and its always the same. Multiple builds
> with different instruction support means that code that is identical at a source
> level may well be significantly different at a binary level, and thats not
> something I can sanely manage in a general purpose environment.
> 
> > 
> > > 3) QA - Building multiple versions of a library means needing to QA multiple
> > > versions of a library.  If you have to have 4 builds to support different levels
> > > of optimization, you've created a 4x increase in the amount of testing you need
> > > to do to ensure consistent behavior.  You need to be aware of how many different
> > > builds are available in the single rpm at all times, and find systems on which
> > > to QA which will ensure that all of the builds get tested (as they are in fact,
> > > unique builds).  While you may not hit all code paths in a single build, you
> > > will at least test all the common paths.
> > 
> > Again, the exact same QA conditions will also apply to an approach using
> > multiple code paths bundled into the same library. Given a choice
> > between one code path with multiple compiles, vs multiple code paths
> > each compiled only once, the multiple code paths option leaves far
> > greater scope for bugs, and when bugs do occur means that you always
> > have to find out what specific hardware it was being run on. Using the
> > exact same code multiply compiled, the vast, vast majority of bugs are
> > going to occur across all platforms and systems so you should rarely
> > need to ask what the specific platform being used is.
> > 
> 
> No, they won't (see above).  Enabling insructions will enable the compiler to
> emit and optimize common paths differently, so identical source code will lead
> to different binary code.  I need to have a single binary so that I know what
> I'm working with when someone opens a bug.  I don't have that using a multiple
> binary approach.  At least with multiple runtime paths (especially/specifically
> with the run time paths we've been discussing, the igbe rx vector path and the
> acl library, which are isolated), I know that, if I get a bug report and the
> backtrace ends in either location, I know I'm sepecifically dealing with that
> code.  With your multiple binary approach, if I get a crash in, say
> rte_eal_init, I need to figure out if this crash happened in the sse3 compiled
> binary, the ss4.2 compiled binary, the avx binary, the avx512 binary, or the
> core2 binary.  You can say thats easy, but its easy to say that when you're not
> the one that has to support it.
> 
> > > 
> > > The bottom line is that Distribution packaging is all about consistency and
> > > commonality.  If you install something for an arch on multiple systems, its the
> > > same thing on each system, and it works in the same way, all the time.  This
> > > strategy breaks that.  Thats why we do run time checks for things.
> > 
> > If you want to have the best tuned code running for each instruction
> > set, then commonality and consistency goes out the window anyway,
> So, this is perhaps where communication is breaking down.  I don't want to have the
> best tuned code running for each instruction set.  What I want is for the dpdk
> to run on a lowest common denominator platform, and be able to opportunistically
> take advantage of accelerated code paths that require advanced cpu featrues.
> 
> 
> Lets take the ixgbe code as an example. Note I didn't add any code paths there,
> at all (in fact I didn't add any anywhere).  The ixgbe rx_burst method gets set
> according to compile time configuration.  You can pick the bulk_alloc rx method,
> or the vectorized rx method at compile time (or some others I think, but thats
> not relevant). As it happened the vectorized rx path option had an implicit
> dependency on SSE4.2.  Instead of requiring that all cpus that run the dpdk have
> SSE4.2, I instead chose to move that compile time decision to a run tmie
> decision, by building only the vectorized path with sse4.2 and only using it if
> we see that the cpu supports sse4.2 at run time.  No new paths created, no new
> support requirements, you're still supporting the same options upstream, the
> only difference is I was able to include them both in a single binary.  Thats
> better for our end users because the single binary still works everywhere.
> Thats better for our QA group because For whatever set of tests they perform,
> they only need an sse4.2 enabled system to test the one isolated path for that
> vector rx code.  The rest of their tests can be conducted once, on any system,
> because the binary is exactly the same.  If we compile multiple binaries,
> testing on one system doesn't mean we've tested all the code.
> 
> > because two different machines calling the same function are going to
> > execute different sets of instructions. The decision then becomes:
> But thats not at all what I wanted.  I want two different machines calling the
> same function to execute the same instructions 99.9999% of the time.  The only
> time I want to diverge from that is in isolated paths where we can take
> advantage of a feature that we otherwise could not (i.e. the ixgbe and acl
> code).  I look at it like the alternatives code in linux.  There are these
> isolated areas where you have limited bits of code that at run time are
> re-written to use available cpu features.  99.9% of the code is identical, but
> in these little spots its ok to diverge from simmilarity because they'er isolated,
> easily identifiable
> 
> > a) whether you need multiple sets of instructions - if no then you pay
> > with lack of performance
> > b) how you get those multiple sets of instructions
> > c) how you validate those multiple sets of instructions.
> > 
> > As is clear by now :-), my preference by far is to have multiple sets of
> > instructions come from a single code base, as less code means less
> > maintenance, and above all, fewer bugs. If that can't be done, then we
> > need to look carefully at each code path being added and do a
> > cost-benefit analysis on it.
> > 
> 
> Yes, its quite clear :), I think its equally clear that I need a single binary,
> and would like to opportunisitcally enhance it where possible without losing the
> fact that its a single binary.
> 
> I suppose its all somewhat moot at this point though,  The reduction to sse3 for
> ixgbe seems agreeable to everyone, and it lets me preserve single binary builds
> there.  I'm currently working on the ACL library, as you noted thats a tougher
> nut to crack.  I think I'll have it done early next week (though i'm sure my
> translation of the instruction set reference to C will need some through testing
> :)).  I'll post it when its ready.
> 

Agreed, lets get everything working to a common baseline anyway. In
terms of the number of RX and TX functions, you mentioned, I'd hope in future 
we could cut the number of them down a bit as we make the vector versions more
generally applicable, but that's a whole discussion for another day.

/Bruce