From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by dpdk.org (Postfix) with ESMTP id BD50E2BAD for ; Tue, 23 Feb 2016 06:35:11 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga102.fm.intel.com with ESMTP; 22 Feb 2016 21:35:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,488,1449561600"; d="scan'208";a="909108345" Received: from fmsmsx105.amr.corp.intel.com ([10.18.124.203]) by fmsmga001.fm.intel.com with ESMTP; 22 Feb 2016 21:35:10 -0800 Received: from fmsmsx118.amr.corp.intel.com (10.18.116.18) by FMSMSX105.amr.corp.intel.com (10.18.124.203) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 22 Feb 2016 21:35:10 -0800 Received: from shsmsx151.ccr.corp.intel.com (10.239.6.50) by fmsmsx118.amr.corp.intel.com (10.18.116.18) with Microsoft SMTP Server (TLS) id 14.3.248.2; Mon, 22 Feb 2016 21:35:10 -0800 Received: from shsmsx101.ccr.corp.intel.com ([169.254.1.249]) by SHSMSX151.ccr.corp.intel.com ([169.254.3.196]) with mapi id 14.03.0248.002; Tue, 23 Feb 2016 13:35:08 +0800 From: "Xie, Huawei" To: Olivier MATZ , Panu Matilainen , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API Thread-Index: AQHRXqe5goBJK/BsE0CGoql1zgn6kQ== Date: Tue, 23 Feb 2016 05:35:08 +0000 Message-ID: References: <1450049754-33635-1-git-send-email-huawei.xie@intel.com> <1453827815-56384-1-git-send-email-huawei.xie@intel.com> <1453827815-56384-2-git-send-email-huawei.xie@intel.com> <56A8CCA3.7060302@redhat.com> <56B237AD.1040209@6wind.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.213.225] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dprovan@bivio.net" Subject: Re: [dpdk-dev] [PATCH v6 1/2] mbuf: provide rte_pktmbuf_alloc_bulk API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 23 Feb 2016 05:35:12 -0000 On 2/22/2016 10:52 PM, Xie, Huawei wrote:=0A= > On 2/4/2016 1:24 AM, Olivier MATZ wrote:=0A= >> Hi,=0A= >>=0A= >> On 01/27/2016 02:56 PM, Panu Matilainen wrote:=0A= >>> Since rte_pktmbuf_alloc_bulk() is an inline function, it is not part of= =0A= >>> the library ABI and should not be listed in the version map.=0A= >>>=0A= >>> I assume its inline for performance reasons, but then you lose the=0A= >>> benefits of dynamic linking such as ability to fix bugs and/or improve= =0A= >>> itby just updating the library. Since the point of having a bulk API is= =0A= >>> to improve performance by reducing the number of calls required, does i= t=0A= >>> really have to be inline? As in, have you actually measured the=0A= >>> difference between inline and non-inline and decided its worth all the= =0A= >>> downsides?=0A= >> Agree with Panu. It would be interesting to compare the performance=0A= >> between inline and non inline to decide whether inlining it or not.=0A= > Will update after i gathered more data. inline could show obvious=0A= > performance difference in some cases.=0A= =0A= Panu and Oliver:=0A= I write a simple benchmark. This benchmark run 10M rounds, in each round=0A= 8 mbufs are allocated through bulk API, and then freed.=0A= These are the CPU cycles measured(Intel(R) Xeon(R) CPU E5-2680 0 @=0A= 2.70GHz, CPU isolated, timer interrupt disabled, rcu offloaded).=0A= Btw, i have removed some exceptional data, the frequency of which is=0A= like 1/10. Sometimes observed user usage suddenly disappeared, no clue=0A= what happened.=0A= =0A= With 8 mbufs allocated, there is about 6% performance increase using inline= .=0A= inline non-inline=0A= 2780738888 2950309416=0A= 2834853696 2951378072=0A= 2823015320 2954500888=0A= 2825060032 2958939912=0A= 2824499804 2898938284=0A= 2810859720 2944892796=0A= 2852229420 3014273296=0A= 2787308500 2956809852=0A= 2793337260 2958674900=0A= 2822223476 2954346352=0A= 2785455184 2925719136=0A= 2821528624 2937380416=0A= 2822922136 2974978604=0A= 2776645920 2947666548=0A= 2815952572 2952316900=0A= 2801048740 2947366984=0A= 2851462672 2946469004=0A= =0A= With 16 mbufs allocated, we could still observe obvious performance=0A= difference, though only 1%-2%=0A= =0A= inline non-inline=0A= 5519987084 5669902680=0A= 5538416096 5737646840=0A= 5578934064 5590165532=0A= 5548131972 5767926840=0A= 5625585696 5831345628=0A= 5558282876 5662223764=0A= 5445587768 5641003924=0A= 5559096320 5775258444=0A= 5656437988 5743969272=0A= 5440939404 5664882412=0A= 5498875968 5785138532=0A= 5561652808 5737123940=0A= 5515211716 5627775604=0A= 5550567140 5630790628=0A= 5665964280 5589568164=0A= 5591295900 5702697308=0A= =0A= With 32/64 mbufs allocated, the deviation of the data itself would hide=0A= the performance difference.=0A= =0A= So we prefer using inline for performance.=0A= >> Also, it would be nice to have a simple test function in=0A= >> app/test/test_mbuf.c. For instance, you could update=0A= >> test_one_pktmbuf() to take a mbuf pointer as a parameter and remove=0A= >> the mbuf allocation from the function. Then it could be called with=0A= >> a mbuf allocated with rte_pktmbuf_alloc() (like before) and with=0A= >> all the mbufs of rte_pktmbuf_alloc_bulk().=0A= =0A= Don't quite get you. Is it that we write two cases, one case allocate=0A= mbuf through rte_pktmbuf_alloc_bulk and one use rte_pktmbuf_alloc? It is=0A= good to have. I could do this after this patch.=0A= >>=0A= >> Regards,=0A= >> Olivier=0A= >>=0A= >=0A= =0A=