From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-we0-f181.google.com (mail-we0-f181.google.com [74.125.82.181]) by dpdk.org (Postfix) with ESMTP id D55ADB39F for ; Wed, 13 Aug 2014 01:16:37 +0200 (CEST) Received: by mail-we0-f181.google.com with SMTP id k48so10413568wev.26 for ; Tue, 12 Aug 2014 16:19:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:organization :user-agent:in-reply-to:references:mime-version :content-transfer-encoding:content-type; bh=04btLs2vqWdgqSAQfxzuqeLry2sO+byu0r2knR2wjCc=; b=I6J3Zl3D3pVi6yT/z6Fd0KTc74Mvzf7Vx0f6MdXOC33FevcXnlOwiBpA98Ytq7ffe1 KHJX7mZYXXkEzBFiRTcoN++g+PZk4dArJtpwU20BAYgoS4mF8oEh14AtpSuQ8z4QqvfZ ab8vaLhH23kZhbbrJRGObeMTqcGNe5ECvAhbHpBXT1Y2TUnuW5uMxk5mTkOKT5ySXi2c cFZx6dMhsQxt94Dawpz+xEKx4juAAlqg3IDaBNTY2b7Bl1LPmu8HIpK1SSqlJR2V5Np1 WOUmqIfVClXSAqQzVeEwXdwB5t98VthcHqxqft9Uyz49y8Utvew7MAIjrkrQTn4G1SfM qd3w== X-Gm-Message-State: ALoCoQkrQ7n9W4s7wHynIGnCZfCDoC0lz7hlsqMis9CnwXzbLpJ52RXnXJto0JLh/7h3DSKkRiyP X-Received: by 10.194.142.148 with SMTP id rw20mr631603wjb.69.1407885575055; Tue, 12 Aug 2014 16:19:35 -0700 (PDT) Received: from xps13.localnet (APoitiers-651-1-236-237.w83-193.abo.wanadoo.fr. [83.193.123.237]) by mx.google.com with ESMTPSA id x11sm335495wjr.15.2014.08.12.16.19.33 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 12 Aug 2014 16:19:34 -0700 (PDT) From: Thomas Monjalon To: Neil Horman , "Richardson, Bruce" , "Ananyev, Konstantin" Date: Wed, 13 Aug 2014 01:19:09 +0200 Message-ID: <3764937.VOzyDKeiIj@xps13> Organization: 6WIND User-Agent: KMail/4.13.3 (Linux/3.15.8-1-ARCH; KDE/4.13.3; x86_64; ; ) In-Reply-To: <20140806172709.GA23133@hmsreliant.think-freely.org> References: <1407166558-9532-1-git-send-email-nhorman@tuxdriver.com> <59AF69C657FD0841A61C55336867B5B0343D666D@IRSMSX103.ger.corp.intel.com> <20140806172709.GA23133@hmsreliant.think-freely.org> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH] acl: If build does not support sse4.2, emulate missing instructions with C code X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Aug 2014 23:16:38 -0000 Hi all, 2014-08-06 13:27, Neil Horman: > Richardson, Bruce wrote: > > Neil Horman > > > Ananyev, Konstantin wrote: > > > > Neil Horman > > > > > Ananyev, Konstantin wrote: > > > > > > As you probably note - we do have a scalar version of rte_acl_classify(): > > > > > > rte_acl_classify_scalar(). > > > > > > So I think it might be faster than vector one with 'emulated' instrincts. > > > > > > Unfortunately it is all mixed right now into one file and 'scalar' version > > > > > > could use few sse4 instrincts through resolve_priority(). > > > > > > Another thing - we consider to add another version of rte_acl_classify() > > > > > > that will use avx2 instrincts. > > > > > > If we go the way you suggest - I am afraid will soon have to provide scalar > > > > > > equivalents for several AVX2 instrincts too. > > > > > > So in summary that way (providing our own scalar equivalents of SIMD > > > > > > instrincts) seems to me slow, hard to maintain and error > > > > > > prone. > > > > > > > > > > > > What porbably can be done instead: rework acl_run.c a bit, so it provide: > > > > > > rte_acl_classify_scalar() - could be build and used on all systems. > > > > > > rte_acl_classify_sse() - could be build and used only on systems with sse4.2 > > > > > > and upper, return ENOTSUP on lower arch. > > > > > > In future: rte_acl_classify_avx2 - could be build and used only on systems > > > > > > with avx2 and upper, return ENOTSUP on lower arch. > > > > > > > > > > > > I am looking at rte_acl right now anyway. > > > > > > So will try to come up with something workable. > > > > > > > > > > So, this is exactly the opposite of what Bruce and I just spent several days > > > > > and a huge email thread that you clearly are aware of discussing run time versus > > > > > compile time selection of paths. At this point I'm done ping ponging between > > > > > your opposing viewpoints. If you want to implement something that does > > > > > run time checking, I'm fine with it, but I'm not going back and forth until you > > > > > two come to an agreement on this. > > > > > > > > Right now, I am not talking about 'run time vs compile time selection'. > > > > > > But you are talking about exactly that, allbeit implicitly. To implement what > > > you recommend above (that being multiple functional paths that return a not > > > supported error code at run time), we need to make run time tests for what the > > > cpu supports. While I'm actually ok with doing that (I think it makes alot of > > > sense), Bruce and I just spent several days and dozens of emails debating that, > > > so you can understand why I don't want to write yet another version of this > > > patch that requires doing the exact thing we just argued about, especially if it > > > means he's going to pipe back up and say no, driving me back to a common single > > > implementation that compiles and runs for all platforms. I'm not going to keep > > > re-writing this boucing back and forth between your opposing viewpoints. We > > > need to agree on a direction before I make another pass at this. > > > > > > > 2) allow easily add(/modify) code path optimised for particular architecture. > > > > Without need to modify/re-test what you call 'least common denominator' > > > > code path. > > > > And visa-versa, if someone find a way to optimise common code path - no > > > > need to touch/retest architecture specific ones. > > > > > > So I'm fine with this, but it is anathema to what Bruce advocated for when I did > > > this latest iteration. Bruce advocated for a single common path that compiled > > > in all cases. Bruce, do you want to comment here? I'd really like to get this > > > settled before I go try this again. > > > > In our previous discussion I was primarily concerned with the ixgbe driver, > > which already had a number of scalar code paths as well as the vector one, > > so I was very keen there not to see more code paths created. > > However, while I hate seeing more code paths created that need to be maintained, > > I am ok with having them created if the benefit is big enough. > > Up till now code path selection would have been done at compile-time, but you've > > convinced me that if we have the code paths there, selecting them at runtime makes > > more sense for a packaged build. > > For ACL specifically, I generally defer to Konstantin as the expert in this area. > > If we need separate code paths for scalar, SSE and AVX, and each gives considerable > > performance improvement over the other, then I'm ok with that. > > I'm still not sure how you thought I was creating new code paths in the ixgbe > driver using run time selection vs. compile time selection, but regardless, if > run time path selection is the consensus, thats good, I can do that. Thanks everyone, it seems we have a consensus. I think it's important to summarize it here: 1) If benefit is big, we allow creation of different code paths, which means different implementations of a same feature in order to have best performance with recent CPU. 2) The choice of the code path can be done at run-time, so it is possible to package a binary which works on default architecture and use latest instructions if available. The current situation requires to build DPDK for the right architecture (native one) in order to have the best performances. This situation will be improved in some critical areas but it isn't planned to fork code paths systematically. Thanks everyone for these efforts -- Thomas