From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 5F8B95687 for ; Mon, 29 Feb 2016 11:51:06 +0100 (CET) Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) by mx1.redhat.com (Postfix) with ESMTPS id 644FF3B75A; Mon, 29 Feb 2016 10:51:05 +0000 (UTC) Received: from sopuli.koti.laiskiainen.org (vpn1-7-93.ams2.redhat.com [10.36.7.93]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id u1TAp2s8020974; Mon, 29 Feb 2016 05:51:04 -0500 To: "Ananyev, Konstantin" , "Xie, Huawei" , Olivier MATZ , "dev@dpdk.org" References: <1450049754-33635-1-git-send-email-huawei.xie@intel.com> <1453827815-56384-1-git-send-email-huawei.xie@intel.com> <1453827815-56384-2-git-send-email-huawei.xie@intel.com> <56A8CCA3.7060302@redhat.com> <56B237AD.1040209@6wind.com> <56CD9DFE.4070702@redhat.com> <2601191342CEEE43887BDE71AB97725836B0AC3F@irsmsx105.ger.corp.intel.com> From: Panu Matilainen Message-ID: <56D42296.6090603@redhat.com> Date: Mon, 29 Feb 2016 12:51:02 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <2601191342CEEE43887BDE71AB97725836B0AC3F@irsmsx105.ger.corp.intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 Cc: "dprovan@bivio.net" Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 29 Feb 2016 10:51:06 -0000 On 02/24/2016 03:23 PM, Ananyev, Konstantin wrote: > Hi Panu, > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Panu Matilainen >> Sent: Wednesday, February 24, 2016 12:12 PM >> To: Xie, Huawei; Olivier MATZ; dev@dpdk.org >> Cc: dprovan@bivio.net >> Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API >> >> On 02/23/2016 07:35 AM, Xie, Huawei wrote: >>> On 2/22/2016 10:52 PM, Xie, Huawei wrote: >>>> On 2/4/2016 1:24 AM, Olivier MATZ wrote: >>>>> Hi, >>>>> >>>>> On 01/27/2016 02:56 PM, Panu Matilainen wrote: >>>>>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of >>>>>> the library ABI and should not be listed in the version map. >>>>>> >>>>>> I assume its inline for performance reasons, but then you lose the >>>>>> benefits of dynamic linking such as ability to fix bugs and/or improve >>>>>> itby just updating the library. Since the point of having a bulk API is >>>>>> to improve performance by reducing the number of calls required, does it >>>>>> really have to be inline? As in, have you actually measured the >>>>>> difference between inline and non-inline and decided its worth all the >>>>>> downsides? >>>>> Agree with Panu. It would be interesting to compare the performance >>>>> between inline and non inline to decide whether inlining it or not. >>>> Will update after i gathered more data. inline could show obvious >>>> performance difference in some cases. >>> >>> Panu and Oliver: >>> I write a simple benchmark. This benchmark run 10M rounds, in each round >>> 8 mbufs are allocated through bulk API, and then freed. >>> These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @ >>> 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded). >>> Btw, i have removed some exceptional data, the frequency of which is >>> like 1/10. Sometimes observed user usage suddenly disappeared, no clue >>> what happened. >>> >>> With 8 mbufs allocated, there is about 6% performance increase using inline. >> [...] >>> >>> With 16 mbufs allocated, we could still observe obvious performance >>> difference, though only 1%-2% >>> >> [...] >>> >>> With 32/64 mbufs allocated, the deviation of the data itself would hide >>> the performance difference. >>> So we prefer using inline for performance. >> >> At least I was more after real-world performance in a real-world >> use-case rather than CPU cycles in a microbenchmark, we know function >> calls have a cost but the benefits tend to outweight the cons. >> >> Inline functions have their place and they're far less evil in project >> internal use, but in library public API they are BAD and should be ... >> well, not banned because there are exceptions to every rule, but highly >> discouraged. > > Why is that? For all the reasons static linking is bad, and what's worse it forces the static linking badness into dynamically linked builds. If there's a bug (security or otherwise) in a library, a distro wants to supply an updated package which fixes that bug and be done with it. But if that bug is in an inlined code, supplying an update is not enough, you also need to recompile everything using that code, and somehow inform customers possibly using that code that they need to not only update the library but to recompile their apps as well. That is precisely the reason distros go to great lenghts to avoid *any* statically linked apps and libs in the distro, completely regardless of the performance overhead. In addition, inlined code complicates ABI compatibility issues because some of the code is one the "wrong" side, and worse, it bypasses all the other ABI compatibility safeguards like soname and symbol versioning. Like said, inlined code is fine for internal consumption, but incredibly bad for public interfaces. And of course, the more complicated a function is, greater the potential of needing bugfixes. Mind you, none of this is magically specific to this particular function. Except in the sense that bulk operations offer a better way of performance improvements than just inlining everything. > As you can see right now we have all mbuf alloc/free routines as static inline. > And I think we would like to keep it like that. > So why that particular function should be different? Because there's much less need to have it inlined since the function call overhead is "amortized" by the fact its doing bulk operations. "We always did it that way" is not a very good reason :) > After all that function is nothing more than a wrapper > around rte_mempool_get_bulk() unrolled by 4 loop {rte_pktmbuf_reset()} > So unless mempool get/put API would change, I can hardly see there could be any ABI > breakages in future. > About 'real world' performance gain - it was a 'real world' performance problem, > that we tried to solve by introducing that function: > http://dpdk.org/ml/archives/dev/2015-May/017633.html > > And according to the user feedback, it does help: > http://dpdk.org/ml/archives/dev/2016-February/033203.html The question is not whether the function is useful, not at all. The question is whether the real-world case sees any measurable difference in performance if the function is made non-inline. - Panu - > Konstantin > >> >> - Panu - >> >