From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 863DEA0503; Fri, 6 May 2022 19:49:01 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6C04B40395; Fri, 6 May 2022 19:49:01 +0200 (CEST) Received: from forward501j.mail.yandex.net (forward501j.mail.yandex.net [5.45.198.251]) by mails.dpdk.org (Postfix) with ESMTP id 550084014F for ; Fri, 6 May 2022 19:49:00 +0200 (CEST) Received: from sas1-521fc86eeedc.qloud-c.yandex.net (sas1-521fc86eeedc.qloud-c.yandex.net [IPv6:2a02:6b8:c08:be82:0:640:521f:c86e]) by forward501j.mail.yandex.net (Yandex) with ESMTP id A49F0623578; Fri, 6 May 2022 20:48:59 +0300 (MSK) Received: from sas1-384d3eaa6677.qloud-c.yandex.net (sas1-384d3eaa6677.qloud-c.yandex.net [2a02:6b8:c14:3a29:0:640:384d:3eaa]) by sas1-521fc86eeedc.qloud-c.yandex.net (mxback/Yandex) with ESMTP id JEcDjnbYV5-mwf0EQZo; Fri, 06 May 2022 20:48:59 +0300 X-Yandex-Fwd: 2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1651859339; bh=VZzCFwZUxHUfJj4AOqg69xDLNs8dm20ZpAj4++oFbAo=; h=In-Reply-To:From:Subject:Cc:References:Date:Message-ID:To; b=KQU8cLeQ7I3t2Qk/ZSoHJ85vRb6eICVq0mNbyImShkg7sxkylSqblNKgqQ+OJnqJj 0eetSGGoCwsz5dYmLkNZGUKDjm5KD4PSssyntTQ6yHHjhAvwm3MWNsG+kVrIjrN1LR 3CB5gJTH4rXZaXJXn7EEXifAjU672yYd1ItnOZ8I= Authentication-Results: sas1-521fc86eeedc.qloud-c.yandex.net; dkim=pass header.i=@yandex.ru Received: by sas1-384d3eaa6677.qloud-c.yandex.net (smtp/Yandex) with ESMTPSA id hL4CuAsrpp-muMuBWxr; Fri, 06 May 2022 20:48:58 +0300 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (Client certificate not present) Message-ID: <1d8bf669-9ada-5ab1-6eb4-bd2a0771c544@yandex.ru> Date: Fri, 6 May 2022 18:48:34 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.0 Subject: Re: [RFC] rte_ring: don't use always inline Content-Language: en-US To: Bruce Richardson , Stephen Hemminger Cc: Honnappa Nagarahalli , Tyler Retzlaff , "dev@dpdk.org" , nd , "Ananyev, Konstantin" References: <20220505224547.394253-1-stephen@networkplumber.org> <20220506072434.GA19777@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> <20220506093341.785086a7@hermes.local> From: Konstantin Ananyev In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 06/05/2022 17:39, Bruce Richardson пишет: > On Fri, May 06, 2022 at 09:33:41AM -0700, Stephen Hemminger wrote: >> On Fri, 6 May 2022 16:28:41 +0100 >> Bruce Richardson wrote: >> >>> On Fri, May 06, 2022 at 03:12:32PM +0000, Honnappa Nagarahalli wrote: >>>> >>>>> >>>>> On Thu, May 05, 2022 at 10:59:32PM +0000, Honnappa Nagarahalli wrote: >>>>>> Thanks Stephen. Do you see any performance difference with this change? >>>>> >>>>> as a matter of due diligence i think a comparison should be made just to be >>>>> confident nothing is regressing. >>>>> >>>>> i support this change in principal since it is generally accepted best practice to >>>>> not force inlining since it can remove more valuable optimizations that the >>>>> compiler may make that the human can't see. >>>>> the optimizations may vary depending on compiler implementation. >>>>> >>>>> force inlining should be used as a targeted measure rather than blanket on >>>>> every function and when in use probably needs to be periodically reviewed and >>>>> potentially removed as the code / compiler evolves. >>>>> >>>>> also one other consideration is the impact of a particular compiler's force >>>>> inlining intrinsic/builtin is that it may permit inlining of functions when not >>>>> declared in a header. i.e. a function from one library may be able to be inlined >>>>> to another binary as a link time optimization. although everything here is in a >>>>> header so it's a bit moot. >>>>> >>>>> i'd like to see this change go in if possible. >>>> Like Stephen mentions below, I am sure we will have a for and against discussion here. >>>> As a DPDK community we have put performance front and center, I would prefer to go down that route first. >>>> >>> >>> I ran some initial numbers with this patch, and the very quick summary of >>> what I've seen so far: >>> >>> * Unit tests show no major differences, and while it depends on what >>> specific number you are interested in, most seem within margin of error. >>> * Within unit tests, the one number I mostly look at when considering >>> inlining is the "empty poll" cost, since I believe we should look to keep >>> that as close to zero as possible. In the past I've seen that number jump >>> from 3 cycles to 12 cycles due to missed inlining. In this case, it seem >>> fine. >>> * Ran a quick test with the eventdev_pipeline example app using SW eventdev, >>> as a test of an actual app which is fairly ring-heavy [used 8 workers >>> with 1000 cycles per packet hop]. (Thanks to Harry vH for this suggestion >>> of a workload) >>> * GCC 8 build - no difference observed >>> * GCC 11 build - approx 2% perf reduction observed Just to note that apart from ring_perf_autotest, there also exist ring_stress_autotest which trying to do some stress-testing in hopefully more realistic usage scenarios. Might be worth to consider when benchmarking. >>> >>> As I said, these are just some quick rough numbers, and I'll try and get >>> some more numbers on a couple of different platforms, see if the small >>> reduction seen is consistent or not. I may also test a few differnet >>> combinations/options in the eventdev test. It would be good if others also >>> tested on a few platforms available to them. >>> >>> /Bruce >> >> I wonder if a mixed approach might help where some key bits were marked >> as more important to inline? Or setting compiler flags in build infra? > > Yep, could be a number of options.