From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f54.google.com (mail-wm0-f54.google.com [74.125.82.54]) by dpdk.org (Postfix) with ESMTP id 6381A6C97 for ; Mon, 29 Feb 2016 17:15:50 +0100 (CET) Received: by mail-wm0-f54.google.com with SMTP id l68so65513004wml.1 for ; Mon, 29 Feb 2016 08:15:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:organization:user-agent :in-reply-to:references:mime-version:content-transfer-encoding; bh=8URIiF94pJRcc4N1UQeg7CptYBCAhNKNO71FuqFtXqs=; b=YKXvR9MJ9Kn/1PrnO/1FSENiuGgoWWJvGjcu6kU+SuMyjlM3YBSq8vOk5/skhLGDGu V5XGSmneNw5YUVfoyylA9GpSRaMwzsyd8IleZnlOULzzfAQRwLQL3kqG73KkyU0hKZMU 2/B9Klxp/531QxcYFXuDKLuwv2LC1DmGJ7B4/5ImWNU1YxswZwMgn7Qm7Ds4wzPJjlWY 2li305FneSqAjKOtBScnd//BlsQXVLcc2UPRToCiV8es8hYbeEV+A8r8RpkQyITWQ1q8 utELA+l3FwmXbmushHDXLXzwAMPq0FvqUrSNWrfGGA/S+lfeUxG4+v7BuovaArCABW46 pkBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:organization :user-agent:in-reply-to:references:mime-version :content-transfer-encoding; bh=8URIiF94pJRcc4N1UQeg7CptYBCAhNKNO71FuqFtXqs=; b=j512Xl2oc5340Ye8RaTkni5Jbir9acJ+vcZXGix5pvCW6qOz+5a/7OM4v5iBAaX/vN to016H1G3GFHgeyw24/vxXNM8HUehDDRtm8M6Jk3NOscXAiDJNIaDiCm9X9+t9MPNxO/ mIHC9qw3PIGa8b1L1BklinCqPnRjaocZiQXu+A8nMQUsCH51P9LxppZO5MHyWQxnNlGK yQy0yUJQ2aXS3GQB32U5v8TiGVedzEKh6KDNwHigVWNBxEPJnCqG29bT3B2w1vVhloEF QgmhLEf9BfH7vCzC4Oxo8ef0O3wZin0gqLHgfrVlpJmdakHOoJ4/524Nt7SJLttSU8fd 4Ojw== X-Gm-Message-State: AD7BkJKfEnk43GxQotUuFv1dktTxergiTB7HdOOVA+0ybtMDeMyeqIvJGWnMWLdxaD4EBl/x X-Received: by 10.28.218.145 with SMTP id r139mr13703225wmg.52.1456762550004; Mon, 29 Feb 2016 08:15:50 -0800 (PST) Received: from xps13.localnet (171.36.101.84.rev.sfr.net. [84.101.36.171]) by smtp.gmail.com with ESMTPSA id 77sm17131942wmp.18.2016.02.29.08.15.48 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 29 Feb 2016 08:15:49 -0800 (PST) From: Thomas Monjalon To: Panu Matilainen Date: Mon, 29 Feb 2016 17:14:15 +0100 Message-ID: <1837798.vOsM3O1Uf3@xps13> Organization: 6WIND User-Agent: KMail/4.14.10 (Linux/4.1.6-1-ARCH; KDE/4.14.11; x86_64; ; ) In-Reply-To: <56D42296.6090603@redhat.com> References: <1450049754-33635-1-git-send-email-huawei.xie@intel.com> <2601191342CEEE43887BDE71AB97725836B0AC3F@irsmsx105.ger.corp.intel.com> <56D42296.6090603@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Cc: dev@dpdk.org, "dprovan@bivio.net" Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Feb 2016 16:15:50 -0000 2016-02-29 12:51, Panu Matilainen: > On 02/24/2016 03:23 PM, Ananyev, Konstantin wrote: > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Panu Matilainen > >> On 02/23/2016 07:35 AM, Xie, Huawei wrote: > >>> On 2/22/2016 10:52 PM, Xie, Huawei wrote: > >>>> On 2/4/2016 1:24 AM, Olivier MATZ wrote: > >>>>> On 01/27/2016 02:56 PM, Panu Matilainen wrote: > >>>>>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of > >>>>>> the library ABI and should not be listed in the version map. > >>>>>> > >>>>>> I assume its inline for performance reasons, but then you lose the > >>>>>> benefits of dynamic linking such as ability to fix bugs and/or improve > >>>>>> itby just updating the library. Since the point of having a bulk API is > >>>>>> to improve performance by reducing the number of calls required, does it > >>>>>> really have to be inline? As in, have you actually measured the > >>>>>> difference between inline and non-inline and decided its worth all the > >>>>>> downsides? > >>>>> Agree with Panu. It would be interesting to compare the performance > >>>>> between inline and non inline to decide whether inlining it or not. > >>>> Will update after i gathered more data. inline could show obvious > >>>> performance difference in some cases. > >>> > >>> Panu and Oliver: > >>> I write a simple benchmark. This benchmark run 10M rounds, in each round > >>> 8 mbufs are allocated through bulk API, and then freed. > >>> These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @ > >>> 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded). > >>> Btw, i have removed some exceptional data, the frequency of which is > >>> like 1/10. Sometimes observed user usage suddenly disappeared, no clue > >>> what happened. > >>> > >>> With 8 mbufs allocated, there is about 6% performance increase using inline. > >> [...] > >>> > >>> With 16 mbufs allocated, we could still observe obvious performance > >>> difference, though only 1%-2% > >>> > >> [...] > >>> > >>> With 32/64 mbufs allocated, the deviation of the data itself would hide > >>> the performance difference. > >>> So we prefer using inline for performance. > >> > >> At least I was more after real-world performance in a real-world > >> use-case rather than CPU cycles in a microbenchmark, we know function > >> calls have a cost but the benefits tend to outweight the cons. > >> > >> Inline functions have their place and they're far less evil in project > >> internal use, but in library public API they are BAD and should be ... > >> well, not banned because there are exceptions to every rule, but highly > >> discouraged. > > > > Why is that? > > For all the reasons static linking is bad, and what's worse it forces > the static linking badness into dynamically linked builds. > > If there's a bug (security or otherwise) in a library, a distro wants to > supply an updated package which fixes that bug and be done with it. But > if that bug is in an inlined code, supplying an update is not enough, > you also need to recompile everything using that code, and somehow > inform customers possibly using that code that they need to not only > update the library but to recompile their apps as well. That is > precisely the reason distros go to great lenghts to avoid *any* > statically linked apps and libs in the distro, completely regardless of > the performance overhead. > > In addition, inlined code complicates ABI compatibility issues because > some of the code is one the "wrong" side, and worse, it bypasses all the > other ABI compatibility safeguards like soname and symbol versioning. > > Like said, inlined code is fine for internal consumption, but incredibly > bad for public interfaces. And of course, the more complicated a > function is, greater the potential of needing bugfixes. > > Mind you, none of this is magically specific to this particular > function. Except in the sense that bulk operations offer a better way of > performance improvements than just inlining everything. > > > As you can see right now we have all mbuf alloc/free routines as static inline. > > And I think we would like to keep it like that. > > So why that particular function should be different? > > Because there's much less need to have it inlined since the function > call overhead is "amortized" by the fact its doing bulk operations. "We > always did it that way" is not a very good reason :) > > > After all that function is nothing more than a wrapper > > around rte_mempool_get_bulk() unrolled by 4 loop {rte_pktmbuf_reset()} > > So unless mempool get/put API would change, I can hardly see there could be any ABI > > breakages in future. > > About 'real world' performance gain - it was a 'real world' performance problem, > > that we tried to solve by introducing that function: > > http://dpdk.org/ml/archives/dev/2015-May/017633.html > > > > And according to the user feedback, it does help: > > http://dpdk.org/ml/archives/dev/2016-February/033203.html > > The question is not whether the function is useful, not at all. The > question is whether the real-world case sees any measurable difference > in performance if the function is made non-inline. This is a valid question, and it applies to a large part of DPDK. But it's something to measure and change more globally than just a new function. Generally speaking, any effort to reduce the size of the exported headers will be welcome. That said, this patch won't be blocked.