From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 02E43A0487
	for <public@inbox.dpdk.org>; Mon,  1 Jul 2019 17:00:15 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id D21501B9EE;
	Mon,  1 Jul 2019 17:00:14 +0200 (CEST)
Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67])
 by dpdk.org (Postfix) with ESMTP id B6BB51B9E5
 for <dev@dpdk.org>; Mon,  1 Jul 2019 17:00:12 +0200 (CEST)
Received: from lfbn-lil-1-176-160.w90-45.abo.wanadoo.fr ([90.45.26.160]
 helo=droids-corp.org)
 by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.89) (envelope-from <olivier.matz@6wind.com>)
 id 1hhxpo-00085V-05; Mon, 01 Jul 2019 17:03:13 +0200
Received: by droids-corp.org (sSMTP sendmail emulation);
 Mon, 01 Jul 2019 17:00:06 +0200
Date: Mon, 1 Jul 2019 17:00:06 +0200
From: Olivier Matz <olivier.matz@6wind.com>
To: "Wang, Xiao W" <xiao.w.wang@intel.com>
Cc: Andrew Rybchenko <arybchenko@solarflare.com>, "dev@dpdk.org" <dev@dpdk.org>
Message-ID: <20190701150006.cxyqbzyf75n373q2@platinum>
References: <20190521090321.138062-1-xiao.w.wang@intel.com>
 <98702cfa-66ed-e13f-a5ae-c85eb6d64c2f@solarflare.com>
 <20190701131103.zi72h3me63mzg73v@platinum>
 <B7F2E978279D1D49A3034B7786DACF407AF5B530@SHSMSX106.ccr.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <B7F2E978279D1D49A3034B7786DACF407AF5B530@SHSMSX106.ccr.corp.intel.com>
User-Agent: NeoMutt/20170113 (1.7.2)
Subject: Re: [dpdk-dev] [PATCH] mempool: optimize copy in cache get
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi,

On Mon, Jul 01, 2019 at 02:21:41PM +0000, Wang, Xiao W wrote:
> Hi,
> 
> > -----Original Message-----
> > From: Olivier Matz [mailto:olivier.matz@6wind.com]
> > Sent: Monday, July 1, 2019 9:11 PM
> > To: Andrew Rybchenko <arybchenko@solarflare.com>
> > Cc: Wang, Xiao W <xiao.w.wang@intel.com>; dev@dpdk.org
> > Subject: Re: [PATCH] mempool: optimize copy in cache get
> > 
> > Hi,
> > 
> > On Tue, May 21, 2019 at 12:34:55PM +0300, Andrew Rybchenko wrote:
> > > On 5/21/19 12:03 PM, Xiao Wang wrote:
> > > > Use rte_memcpy to improve the pointer array copy. This optimization
> > method
> > > > has already been applied to __mempool_generic_put() [1], this patch
> > applies
> > > > it to __mempool_generic_get(). Slight performance gain can be observed
> > in
> > > > testpmd txonly test.
> > > >
> > > > [1] 863bfb47449 ("mempool: optimize copy in cache")
> > > >
> > > > Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
> > > > ---
> > > >   lib/librte_mempool/rte_mempool.h | 7 +------
> > > >   1 file changed, 1 insertion(+), 6 deletions(-)
> > > >
> > > > diff --git a/lib/librte_mempool/rte_mempool.h
> > b/lib/librte_mempool/rte_mempool.h
> > > > index 8053f7a04..975da8d22 100644
> > > > --- a/lib/librte_mempool/rte_mempool.h
> > > > +++ b/lib/librte_mempool/rte_mempool.h
> > > > @@ -1344,15 +1344,11 @@ __mempool_generic_get(struct
> > rte_mempool *mp, void **obj_table,
> > > >   		      unsigned int n, struct rte_mempool_cache *cache)
> > > >   {
> > > >   	int ret;
> > > > -	uint32_t index, len;
> > > > -	void **cache_objs;
> > > >   	/* No cache provided or cannot be satisfied from cache */
> > > >   	if (unlikely(cache == NULL || n >= cache->size))
> > > >   		goto ring_dequeue;
> > > > -	cache_objs = cache->objs;
> > > > -
> > > >   	/* Can this be satisfied from the cache? */
> > > >   	if (cache->len < n) {
> > > >   		/* No. Backfill the cache first, and then fill from it */
> > > > @@ -1375,8 +1371,7 @@ __mempool_generic_get(struct rte_mempool
> > *mp, void **obj_table,
> > > >   	}
> > > >   	/* Now fill in the response ... */
> > > > -	for (index = 0, len = cache->len - 1; index < n; ++index, len--,
> > obj_table++)
> > > > -		*obj_table = cache_objs[len];
> > > > +	rte_memcpy(obj_table, &cache->objs[cache->len - n], sizeof(void *) *
> > n);
> > > >   	cache->len -= n;
> > >
> > > I think the idea of the loop above is to get objects in reverse order to
> > > order
> > > to reuse cache top objects (put last) first. It should improve cache hit
> > > etc.
> > > So, performance effect of the patch could be very different on various CPUs
> > > (with different cache sizes) and various work-loads.
> > >
> > > So, I doubt that it is a step in right direction.
> > 
> > For reference, this was already discussed 3 years ago:
> > 
> > https://mails.dpdk.org/archives/dev/2016-May/039873.html
> > https://mails.dpdk.org/archives/dev/2016-June/040029.html
> > 
> > I'm still not convinced that reversing object addresses (as it's done
> > today) is really important. But Andrew is probably right, the impact of
> > this kind of patch probably varies depending on many factors. More
> > performance numbers on real-life use-cases would help to decide what to
> > do.
> > 
> > Regards,
> > Olivier
> 
> I agree, and thanks for the reference link. So theoretically neither way can be
> a definite best choice, it depends on various real-life factors. I'm thinking about
> how to let app developer be aware of this so that they themselves could make
> the choice. Or it's not worth doing due to small perf gain?

I don't think it's worth doing a dynamic selection for this.

On the other hand, having performance numbers for different
use-cases/archs is welcome. From a pure cpu cycles perspective, the
rte_memcpy() should be faster.

Regards,
Olivier