From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <huawei.xie@intel.com>
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id AC3852935
 for <dev@dpdk.org>; Fri, 26 Feb 2016 08:39:27 +0100 (CET)
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by fmsmga102.fm.intel.com with ESMTP; 25 Feb 2016 23:39:26 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.22,498,1449561600"; d="scan'208";a="911672182"
Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206])
 by fmsmga001.fm.intel.com with ESMTP; 25 Feb 2016 23:39:26 -0800
Received: from fmsmsx116.amr.corp.intel.com (10.18.116.20) by
 FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS)
 id 14.3.248.2; Thu, 25 Feb 2016 23:39:26 -0800
Received: from shsmsx104.ccr.corp.intel.com (10.239.110.15) by
 fmsmsx116.amr.corp.intel.com (10.18.116.20) with Microsoft SMTP Server (TLS)
 id 14.3.248.2; Thu, 25 Feb 2016 23:39:26 -0800
Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.136]) by
 SHSMSX104.ccr.corp.intel.com ([169.254.5.132]) with mapi id 14.03.0248.002;
 Fri, 26 Feb 2016 15:39:23 +0800
From: "Xie, Huawei" <huawei.xie@intel.com>
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, Panu Matilainen
 <pmatilai@redhat.com>, Olivier MATZ <olivier.matz@6wind.com>, "dev@dpdk.org"
 <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk
 API
Thread-Index: AQHRXqe5goBJK/BsE0CGoql1zgn6kQ==
Date: Fri, 26 Feb 2016 07:39:23 +0000
Message-ID: <C37D651A908B024F974696C65296B57B4C60F4C6@SHSMSX101.ccr.corp.intel.com>
References: <1450049754-33635-1-git-send-email-huawei.xie@intel.com>
 <1453827815-56384-1-git-send-email-huawei.xie@intel.com>
 <1453827815-56384-2-git-send-email-huawei.xie@intel.com>
 <56A8CCA3.7060302@redhat.com> <56B237AD.1040209@6wind.com>
 <C37D651A908B024F974696C65296B57B4C5EE961@SHSMSX101.ccr.corp.intel.com>
 <C37D651A908B024F974696C65296B57B4C5F0655@SHSMSX101.ccr.corp.intel.com>
 <56CD9DFE.4070702@redhat.com>
 <2601191342CEEE43887BDE71AB97725836B0AC3F@irsmsx105.ger.corp.intel.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.239.127.40]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dprovan@bivio.net" <dprovan@bivio.net>
Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk
 API
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 26 Feb 2016 07:39:28 -0000

On 2/24/2016 9:23 PM, Ananyev, Konstantin wrote:=0A=
> Hi Panu,=0A=
>=0A=
>> -----Original Message-----=0A=
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Panu Matilainen=0A=
>> Sent: Wednesday, February 24, 2016 12:12 PM=0A=
>> To: Xie, Huawei; Olivier MATZ; dev@dpdk.org=0A=
>> Cc: dprovan@bivio.net=0A=
>> Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_b=
ulk API=0A=
>>=0A=
>> On 02/23/2016 07:35 AM, Xie, Huawei wrote:=0A=
>>> On 2/22/2016 10:52 PM, Xie, Huawei wrote:=0A=
>>>> On 2/4/2016 1:24 AM, Olivier MATZ wrote:=0A=
>>>>> Hi,=0A=
>>>>>=0A=
>>>>> On 01/27/2016 02:56 PM, Panu Matilainen wrote:=0A=
>>>>>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part=
 of=0A=
>>>>>> the library ABI and should not be listed in the version map.=0A=
>>>>>>=0A=
>>>>>> I assume its inline for performance reasons, but then you lose the=
=0A=
>>>>>> benefits of dynamic linking such as ability to fix bugs and/or impro=
ve=0A=
>>>>>> itby just updating the library. Since the point of having a bulk API=
 is=0A=
>>>>>> to improve performance by reducing the number of calls required, doe=
s it=0A=
>>>>>> really have to be inline? As in, have you actually measured the=0A=
>>>>>> difference between inline and non-inline and decided its worth all t=
he=0A=
>>>>>> downsides?=0A=
>>>>> Agree with Panu. It would be interesting to compare the performance=
=0A=
>>>>> between inline and non inline to decide whether inlining it or not.=
=0A=
>>>> Will update after i gathered more data. inline could show obvious=0A=
>>>> performance difference in some cases.=0A=
>>> Panu and Oliver:=0A=
>>> I write a simple benchmark. This benchmark run 10M rounds, in each roun=
d=0A=
>>> 8 mbufs are allocated through bulk API, and then freed.=0A=
>>> These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @=0A=
>>> 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded).=0A=
>>> Btw, i have removed some exceptional data, the frequency of which is=0A=
>>> like 1/10. Sometimes observed user usage suddenly disappeared, no clue=
=0A=
>>> what happened.=0A=
>>>=0A=
>>> With 8 mbufs allocated, there is about 6% performance increase using in=
line.=0A=
>> [...]=0A=
>>> With 16 mbufs allocated, we could still observe obvious performance=0A=
>>> difference, though only 1%-2%=0A=
>>>=0A=
>> [...]=0A=
>>> With 32/64 mbufs allocated, the deviation of the data itself would hide=
=0A=
>>> the performance difference.=0A=
>>> So we prefer using inline for performance.=0A=
>> At least I was more after real-world performance in a real-world=0A=
>> use-case rather than CPU cycles in a microbenchmark, we know function=0A=
>> calls have a cost but the benefits tend to outweight the cons.=0A=
=0A=
It depends on what could be called the real world case. It could be=0A=
argued. I think the case Konstantin mentioned could be called a real=0A=
world one.=0A=
If your opinion on whether use benchmark or real-world use case is not=0A=
specific to this bulk API, then i have different opinion. For example,=0A=
for kernel virtio optimization, people use vring bench. We couldn't=0A=
guarantee each small optimization could bring obvious performance gain=0A=
in some big workload. The gain could be hided if bottleneck is=0A=
elsewhere, so i also plan to build such kind of virtio bench in DPDK.=0A=
=0A=
Finally, i am open to inline or not, but currently priority better goes=0A=
with performance. If we make it an API now, we couldn't easily step back=0A=
in future; But we could change otherwise, after we have more confidence.=0A=
We could even check every inline "API", whether it should be inline or=0A=
be in the lib.=0A=
=0A=
>>=0A=
>> Inline functions have their place and they're far less evil in project=
=0A=
>> internal use, but in library public API they are BAD and should be ...=
=0A=
>> well, not banned because there are exceptions to every rule, but highly=
=0A=
>> discouraged.=0A=
> Why is that?=0A=
> As you can see right now we have all mbuf alloc/free routines as static i=
nline.=0A=
> And I think we would like to keep it like that.=0A=
> So why that particular function should be different?=0A=
> After all that function is nothing more than a wrapper =0A=
> around rte_mempool_get_bulk()  unrolled by 4 loop {rte_pktmbuf_reset()}=
=0A=
> So unless mempool get/put API would change, I can hardly see there could =
be any ABI=0A=
> breakages in future. =0A=
> About 'real world' performance gain - it was a 'real world' performance p=
roblem,=0A=
> that we tried to solve by introducing that function:=0A=
> http://dpdk.org/ml/archives/dev/2015-May/017633.html=0A=
>=0A=
> And according to the user feedback, it does help:  =0A=
> http://dpdk.org/ml/archives/dev/2016-February/033203.html=0A=
>=0A=
> Konstantin=0A=
>=0A=
>> 	- Panu -=0A=
>>=0A=
=0A=