From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5D061A0487 for ; Mon, 1 Jul 2019 16:22:18 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 4C4B01B9BE; Mon, 1 Jul 2019 16:22:17 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id A66311B99D for ; Mon, 1 Jul 2019 16:22:14 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Jul 2019 07:21:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.63,439,1557212400"; d="scan'208";a="246934083" Received: from fmsmsx104.amr.corp.intel.com ([10.18.124.202]) by orsmga001.jf.intel.com with ESMTP; 01 Jul 2019 07:21:44 -0700 Received: from FMSMSX110.amr.corp.intel.com (10.18.116.10) by fmsmsx104.amr.corp.intel.com (10.18.124.202) with Microsoft SMTP Server (TLS) id 14.3.439.0; Mon, 1 Jul 2019 07:21:43 -0700 Received: from shsmsx104.ccr.corp.intel.com (10.239.4.70) by fmsmsx110.amr.corp.intel.com (10.18.116.10) with Microsoft SMTP Server (TLS) id 14.3.439.0; Mon, 1 Jul 2019 07:21:43 -0700 Received: from shsmsx106.ccr.corp.intel.com ([169.254.10.240]) by SHSMSX104.ccr.corp.intel.com ([169.254.5.110]) with mapi id 14.03.0439.000; Mon, 1 Jul 2019 22:21:41 +0800 From: "Wang, Xiao W" To: Olivier Matz , Andrew Rybchenko CC: "dev@dpdk.org" Thread-Topic: [PATCH] mempool: optimize copy in cache get Thread-Index: AQHVD7ZnqvpJLa7I+k2NTIbZn0mBj6Z0y1yAgECr+YCAAJSgoA== Date: Mon, 1 Jul 2019 14:21:41 +0000 Message-ID: References: <20190521090321.138062-1-xiao.w.wang@intel.com> <98702cfa-66ed-e13f-a5ae-c85eb6d64c2f@solarflare.com> <20190701131103.zi72h3me63mzg73v@platinum> In-Reply-To: <20190701131103.zi72h3me63mzg73v@platinum> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ctpclassification: CTP_NT x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZmM3ZTQ0ODMtZDQxZC00ZmFiLWFlZTgtMTljYjZmYzgxOTRjIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiTG1kNnpJTEo0ZFdWQmlKODhGcGFsSEVydE9SMVdxeFwvMlR1R0xUdEVhTTRWTkh4NFJhR2ZKNlZFMXFXMjNZQkkifQ== dlp-product: dlpe-windows dlp-version: 11.2.0.6 dlp-reaction: no-action x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] mempool: optimize copy in cache get X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi, > -----Original Message----- > From: Olivier Matz [mailto:olivier.matz@6wind.com] > Sent: Monday, July 1, 2019 9:11 PM > To: Andrew Rybchenko > Cc: Wang, Xiao W ; dev@dpdk.org > Subject: Re: [PATCH] mempool: optimize copy in cache get >=20 > Hi, >=20 > On Tue, May 21, 2019 at 12:34:55PM +0300, Andrew Rybchenko wrote: > > On 5/21/19 12:03 PM, Xiao Wang wrote: > > > Use rte_memcpy to improve the pointer array copy. This optimization > method > > > has already been applied to __mempool_generic_put() [1], this patch > applies > > > it to __mempool_generic_get(). Slight performance gain can be observe= d > in > > > testpmd txonly test. > > > > > > [1] 863bfb47449 ("mempool: optimize copy in cache") > > > > > > Signed-off-by: Xiao Wang > > > --- > > > lib/librte_mempool/rte_mempool.h | 7 +------ > > > 1 file changed, 1 insertion(+), 6 deletions(-) > > > > > > diff --git a/lib/librte_mempool/rte_mempool.h > b/lib/librte_mempool/rte_mempool.h > > > index 8053f7a04..975da8d22 100644 > > > --- a/lib/librte_mempool/rte_mempool.h > > > +++ b/lib/librte_mempool/rte_mempool.h > > > @@ -1344,15 +1344,11 @@ __mempool_generic_get(struct > rte_mempool *mp, void **obj_table, > > > unsigned int n, struct rte_mempool_cache *cache) > > > { > > > int ret; > > > - uint32_t index, len; > > > - void **cache_objs; > > > /* No cache provided or cannot be satisfied from cache */ > > > if (unlikely(cache =3D=3D NULL || n >=3D cache->size)) > > > goto ring_dequeue; > > > - cache_objs =3D cache->objs; > > > - > > > /* Can this be satisfied from the cache? */ > > > if (cache->len < n) { > > > /* No. Backfill the cache first, and then fill from it */ > > > @@ -1375,8 +1371,7 @@ __mempool_generic_get(struct rte_mempool > *mp, void **obj_table, > > > } > > > /* Now fill in the response ... */ > > > - for (index =3D 0, len =3D cache->len - 1; index < n; ++index, len--= , > obj_table++) > > > - *obj_table =3D cache_objs[len]; > > > + rte_memcpy(obj_table, &cache->objs[cache->len - n], sizeof(void *) = * > n); > > > cache->len -=3D n; > > > > I think the idea of the loop above is to get objects in reverse order t= o > > order > > to reuse cache top objects (put last) first. It should improve cache hi= t > > etc. > > So, performance effect of the patch could be very different on various = CPUs > > (with different cache sizes) and various work-loads. > > > > So, I doubt that it is a step in right direction. >=20 > For reference, this was already discussed 3 years ago: >=20 > https://mails.dpdk.org/archives/dev/2016-May/039873.html > https://mails.dpdk.org/archives/dev/2016-June/040029.html >=20 > I'm still not convinced that reversing object addresses (as it's done > today) is really important. But Andrew is probably right, the impact of > this kind of patch probably varies depending on many factors. More > performance numbers on real-life use-cases would help to decide what to > do. >=20 > Regards, > Olivier I agree, and thanks for the reference link. So theoretically neither way ca= n be a definite best choice, it depends on various real-life factors. I'm thinki= ng about how to let app developer be aware of this so that they themselves could mak= e the choice. Or it's not worth doing due to small perf gain? BRs, Xiao=20