From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 02E43A0487 for ; Mon, 1 Jul 2019 17:00:15 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id D21501B9EE; Mon, 1 Jul 2019 17:00:14 +0200 (CEST) Received: from mail.droids-corp.org (zoll.droids-corp.org [94.23.50.67]) by dpdk.org (Postfix) with ESMTP id B6BB51B9E5 for ; Mon, 1 Jul 2019 17:00:12 +0200 (CEST) Received: from lfbn-lil-1-176-160.w90-45.abo.wanadoo.fr ([90.45.26.160] helo=droids-corp.org) by mail.droids-corp.org with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:256) (Exim 4.89) (envelope-from ) id 1hhxpo-00085V-05; Mon, 01 Jul 2019 17:03:13 +0200 Received: by droids-corp.org (sSMTP sendmail emulation); Mon, 01 Jul 2019 17:00:06 +0200 Date: Mon, 1 Jul 2019 17:00:06 +0200 From: Olivier Matz To: "Wang, Xiao W" Cc: Andrew Rybchenko , "dev@dpdk.org" Message-ID: <20190701150006.cxyqbzyf75n373q2@platinum> References: <20190521090321.138062-1-xiao.w.wang@intel.com> <98702cfa-66ed-e13f-a5ae-c85eb6d64c2f@solarflare.com> <20190701131103.zi72h3me63mzg73v@platinum> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Subject: Re: [dpdk-dev] [PATCH] mempool: optimize copy in cache get X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, On Mon, Jul 01, 2019 at 02:21:41PM +0000, Wang, Xiao W wrote: > Hi, > > > -----Original Message----- > > From: Olivier Matz [mailto:olivier.matz@6wind.com] > > Sent: Monday, July 1, 2019 9:11 PM > > To: Andrew Rybchenko > > Cc: Wang, Xiao W ; dev@dpdk.org > > Subject: Re: [PATCH] mempool: optimize copy in cache get > > > > Hi, > > > > On Tue, May 21, 2019 at 12:34:55PM +0300, Andrew Rybchenko wrote: > > > On 5/21/19 12:03 PM, Xiao Wang wrote: > > > > Use rte_memcpy to improve the pointer array copy. This optimization > > method > > > > has already been applied to __mempool_generic_put() [1], this patch > > applies > > > > it to __mempool_generic_get(). Slight performance gain can be observed > > in > > > > testpmd txonly test. > > > > > > > > [1] 863bfb47449 ("mempool: optimize copy in cache") > > > > > > > > Signed-off-by: Xiao Wang > > > > --- > > > > lib/librte_mempool/rte_mempool.h | 7 +------ > > > > 1 file changed, 1 insertion(+), 6 deletions(-) > > > > > > > > diff --git a/lib/librte_mempool/rte_mempool.h > > b/lib/librte_mempool/rte_mempool.h > > > > index 8053f7a04..975da8d22 100644 > > > > --- a/lib/librte_mempool/rte_mempool.h > > > > +++ b/lib/librte_mempool/rte_mempool.h > > > > @@ -1344,15 +1344,11 @@ __mempool_generic_get(struct > > rte_mempool *mp, void **obj_table, > > > > unsigned int n, struct rte_mempool_cache *cache) > > > > { > > > > int ret; > > > > - uint32_t index, len; > > > > - void **cache_objs; > > > > /* No cache provided or cannot be satisfied from cache */ > > > > if (unlikely(cache == NULL || n >= cache->size)) > > > > goto ring_dequeue; > > > > - cache_objs = cache->objs; > > > > - > > > > /* Can this be satisfied from the cache? */ > > > > if (cache->len < n) { > > > > /* No. Backfill the cache first, and then fill from it */ > > > > @@ -1375,8 +1371,7 @@ __mempool_generic_get(struct rte_mempool > > *mp, void **obj_table, > > > > } > > > > /* Now fill in the response ... */ > > > > - for (index = 0, len = cache->len - 1; index < n; ++index, len--, > > obj_table++) > > > > - *obj_table = cache_objs[len]; > > > > + rte_memcpy(obj_table, &cache->objs[cache->len - n], sizeof(void *) * > > n); > > > > cache->len -= n; > > > > > > I think the idea of the loop above is to get objects in reverse order to > > > order > > > to reuse cache top objects (put last) first. It should improve cache hit > > > etc. > > > So, performance effect of the patch could be very different on various CPUs > > > (with different cache sizes) and various work-loads. > > > > > > So, I doubt that it is a step in right direction. > > > > For reference, this was already discussed 3 years ago: > > > > https://mails.dpdk.org/archives/dev/2016-May/039873.html > > https://mails.dpdk.org/archives/dev/2016-June/040029.html > > > > I'm still not convinced that reversing object addresses (as it's done > > today) is really important. But Andrew is probably right, the impact of > > this kind of patch probably varies depending on many factors. More > > performance numbers on real-life use-cases would help to decide what to > > do. > > > > Regards, > > Olivier > > I agree, and thanks for the reference link. So theoretically neither way can be > a definite best choice, it depends on various real-life factors. I'm thinking about > how to let app developer be aware of this so that they themselves could make > the choice. Or it's not worth doing due to small perf gain? I don't think it's worth doing a dynamic selection for this. On the other hand, having performance numbers for different use-cases/archs is welcome. From a pure cpu cycles perspective, the rte_memcpy() should be faster. Regards, Olivier