From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id AC3852935 for ; Fri, 26 Feb 2016 08:39:27 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 25 Feb 2016 23:39:26 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,498,1449561600"; d="scan'208";a="911672182" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by fmsmga001.fm.intel.com with ESMTP; 25 Feb 2016 23:39:26 -0800 Received: from fmsmsx116.amr.corp.intel.com (10.18.116.20) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 25 Feb 2016 23:39:26 -0800 Received: from shsmsx104.ccr.corp.intel.com (10.239.110.15) by fmsmsx116.amr.corp.intel.com (10.18.116.20) with Microsoft SMTP Server (TLS) id 14.3.248.2; Thu, 25 Feb 2016 23:39:26 -0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.136]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.132]) with mapi id 14.03.0248.002; Fri, 26 Feb 2016 15:39:23 +0800 From: "Xie, Huawei" To: "Ananyev, Konstantin" , Panu Matilainen , Olivier MATZ , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API Thread-Index: AQHRXqe5goBJK/BsE0CGoql1zgn6kQ== Date: Fri, 26 Feb 2016 07:39:23 +0000 Message-ID: References: <1450049754-33635-1-git-send-email-huawei.xie@intel.com> <1453827815-56384-1-git-send-email-huawei.xie@intel.com> <1453827815-56384-2-git-send-email-huawei.xie@intel.com> <56A8CCA3.7060302@redhat.com> <56B237AD.1040209@6wind.com> <56CD9DFE.4070702@redhat.com> <2601191342CEEE43887BDE71AB97725836B0AC3F@irsmsx105.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dprovan@bivio.net" Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 26 Feb 2016 07:39:28 -0000 On 2/24/2016 9:23 PM, Ananyev, Konstantin wrote:=0A= > Hi Panu,=0A= >=0A= >> -----Original Message-----=0A= >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Panu Matilainen=0A= >> Sent: Wednesday, February 24, 2016 12:12 PM=0A= >> To: Xie, Huawei; Olivier MATZ; dev@dpdk.org=0A= >> Cc: dprovan@bivio.net=0A= >> Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_b= ulk API=0A= >>=0A= >> On 02/23/2016 07:35 AM, Xie, Huawei wrote:=0A= >>> On 2/22/2016 10:52 PM, Xie, Huawei wrote:=0A= >>>> On 2/4/2016 1:24 AM, Olivier MATZ wrote:=0A= >>>>> Hi,=0A= >>>>>=0A= >>>>> On 01/27/2016 02:56 PM, Panu Matilainen wrote:=0A= >>>>>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part= of=0A= >>>>>> the library ABI and should not be listed in the version map.=0A= >>>>>>=0A= >>>>>> I assume its inline for performance reasons, but then you lose the= =0A= >>>>>> benefits of dynamic linking such as ability to fix bugs and/or impro= ve=0A= >>>>>> itby just updating the library. Since the point of having a bulk API= is=0A= >>>>>> to improve performance by reducing the number of calls required, doe= s it=0A= >>>>>> really have to be inline? As in, have you actually measured the=0A= >>>>>> difference between inline and non-inline and decided its worth all t= he=0A= >>>>>> downsides?=0A= >>>>> Agree with Panu. It would be interesting to compare the performance= =0A= >>>>> between inline and non inline to decide whether inlining it or not.= =0A= >>>> Will update after i gathered more data. inline could show obvious=0A= >>>> performance difference in some cases.=0A= >>> Panu and Oliver:=0A= >>> I write a simple benchmark. This benchmark run 10M rounds, in each roun= d=0A= >>> 8 mbufs are allocated through bulk API, and then freed.=0A= >>> These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @=0A= >>> 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded).=0A= >>> Btw, i have removed some exceptional data, the frequency of which is=0A= >>> like 1/10. Sometimes observed user usage suddenly disappeared, no clue= =0A= >>> what happened.=0A= >>>=0A= >>> With 8 mbufs allocated, there is about 6% performance increase using in= line.=0A= >> [...]=0A= >>> With 16 mbufs allocated, we could still observe obvious performance=0A= >>> difference, though only 1%-2%=0A= >>>=0A= >> [...]=0A= >>> With 32/64 mbufs allocated, the deviation of the data itself would hide= =0A= >>> the performance difference.=0A= >>> So we prefer using inline for performance.=0A= >> At least I was more after real-world performance in a real-world=0A= >> use-case rather than CPU cycles in a microbenchmark, we know function=0A= >> calls have a cost but the benefits tend to outweight the cons.=0A= =0A= It depends on what could be called the real world case. It could be=0A= argued. I think the case Konstantin mentioned could be called a real=0A= world one.=0A= If your opinion on whether use benchmark or real-world use case is not=0A= specific to this bulk API, then i have different opinion. For example,=0A= for kernel virtio optimization, people use vring bench. We couldn't=0A= guarantee each small optimization could bring obvious performance gain=0A= in some big workload. The gain could be hided if bottleneck is=0A= elsewhere, so i also plan to build such kind of virtio bench in DPDK.=0A= =0A= Finally, i am open to inline or not, but currently priority better goes=0A= with performance. If we make it an API now, we couldn't easily step back=0A= in future; But we could change otherwise, after we have more confidence.=0A= We could even check every inline "API", whether it should be inline or=0A= be in the lib.=0A= =0A= >>=0A= >> Inline functions have their place and they're far less evil in project= =0A= >> internal use, but in library public API they are BAD and should be ...= =0A= >> well, not banned because there are exceptions to every rule, but highly= =0A= >> discouraged.=0A= > Why is that?=0A= > As you can see right now we have all mbuf alloc/free routines as static i= nline.=0A= > And I think we would like to keep it like that.=0A= > So why that particular function should be different?=0A= > After all that function is nothing more than a wrapper =0A= > around rte_mempool_get_bulk() unrolled by 4 loop {rte_pktmbuf_reset()}= =0A= > So unless mempool get/put API would change, I can hardly see there could = be any ABI=0A= > breakages in future. =0A= > About 'real world' performance gain - it was a 'real world' performance p= roblem,=0A= > that we tried to solve by introducing that function:=0A= > http://dpdk.org/ml/archives/dev/2015-May/017633.html=0A= >=0A= > And according to the user feedback, it does help: =0A= > http://dpdk.org/ml/archives/dev/2016-February/033203.html=0A= >=0A= > Konstantin=0A= >=0A= >> - Panu -=0A= >>=0A= =0A=