From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <david.hunt@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id C939DC678
 for <dev@dpdk.org>; Fri, 24 Jun 2016 17:56:42 +0200 (CEST)
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
 by fmsmga103.fm.intel.com with ESMTP; 24 Jun 2016 08:56:42 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.26,521,1459839600"; 
 d="scan'208,217";a="724528298"
Received: from dhunt5-mobl.ger.corp.intel.com (HELO [10.237.220.49])
 ([10.237.220.49])
 by FMSMGA003.fm.intel.com with ESMTP; 24 Jun 2016 08:56:39 -0700
To: Jerin Jacob <jerin.jacob@caviumnetworks.com>,
 Olivier Matz <olivier.matz@6wind.com>
References: <1464101442-10501-1-git-send-email-jerin.jacob@caviumnetworks.com>
 <57446C63.4040605@6wind.com> <20160524151654.GA10870@localhost.localdomain>
Cc: dev@dpdk.org, thomas.monjalon@6wind.com, bruce.richardson@intel.com,
 konstantin.ananyev@intel.com
From: "Hunt, David" <david.hunt@intel.com>
Message-ID: <576D5837.3060907@intel.com>
Date: Fri, 24 Jun 2016 16:56:39 +0100
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101
 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <20160524151654.GA10870@localhost.localdomain>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-Content-Filtered-By: Mailman/MimeDel 2.1.15
Subject: Re: [dpdk-dev] [PATCH] mbuf: replace c memcpy code semantics with
 optimized rte_memcpy
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 24 Jun 2016 15:56:44 -0000

Hi Jerin,

I just ran a couple of tests on this patch on the latest master head on 
a couple of machines. An older quad socket E5-4650 and a quad socket 
E5-2699 v3

E5-4650:
I'm seeing a gain of 2% for un-cached tests and a gain of 9% on the 
cached tests.

E5-2699 v3:
I'm seeing a loss of 0.1% for un-cached tests and a gain of 11% on the 
cached tests.

This is purely the autotest comparison, I don't have traffic generator 
results. But based on the above, I don't think there are any performance 
issues with the patch.

Regards,
Dave.




On 24/5/2016 4:17 PM, Jerin Jacob wrote:
> On Tue, May 24, 2016 at 04:59:47PM +0200, Olivier Matz wrote:
>> Hi Jerin,
>>
>>
>> On 05/24/2016 04:50 PM, Jerin Jacob wrote:
>>> Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>
>>> ---
>>>   lib/librte_mempool/rte_mempool.h | 5 ++---
>>>   1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h
>>> index ed2c110..ebe399a 100644
>>> --- a/lib/librte_mempool/rte_mempool.h
>>> +++ b/lib/librte_mempool/rte_mempool.h
>>> @@ -74,6 +74,7 @@
>>>   #include <rte_memory.h>
>>>   #include <rte_branch_prediction.h>
>>>   #include <rte_ring.h>
>>> +#include <rte_memcpy.h>
>>>   
>>>   #ifdef __cplusplus
>>>   extern "C" {
>>> @@ -917,7 +918,6 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
>>>   		    unsigned n, __rte_unused int is_mp)
>>>   {
>>>   	struct rte_mempool_cache *cache;
>>> -	uint32_t index;
>>>   	void **cache_objs;
>>>   	unsigned lcore_id = rte_lcore_id();
>>>   	uint32_t cache_size = mp->cache_size;
>>> @@ -946,8 +946,7 @@ __mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
>>>   	 */
>>>   
>>>   	/* Add elements back into the cache */
>>> -	for (index = 0; index < n; ++index, obj_table++)
>>> -		cache_objs[index] = *obj_table;
>>> +	rte_memcpy(&cache_objs[0], obj_table, sizeof(void *) * n);
>>>   
>>>   	cache->len += n;
>>>   
>>>
>> The commit title should be "mempool" instead of "mbuf".
> I will fix it.
>
>> Are you seeing some performance improvement by using rte_memcpy()?
> Yes, In some case, In default case, It was replaced with memcpy by the
> compiler itself(gcc 5.3). But when I tried external mempool manager patch and
> then performance dropped almost 800Kpps. Debugging further it turns out that
> external mempool managers unrelated change was knocking out the memcpy.
> explicit rte_memcpy brought back 500Kpps. Remaing 300Kpps drop is still
> unknown(In my test setup, packets are in the local cache, so it must be
> something do with __mempool_put_bulk text alignment change or similar.
>
> Anyone else observed performance drop with external poolmanager?
>
> Jerin
>
>> Regards
>> Olivier