From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by dpdk.org (Postfix) with ESMTP id CA7B711F5 for ; Thu, 2 Jul 2015 19:08:02 +0200 (CEST) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP; 02 Jul 2015 10:08:01 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.15,394,1432623600"; d="scan'208";a="721870050" Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75]) by orsmga001.jf.intel.com with ESMTP; 02 Jul 2015 10:08:00 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.245]) by IRSMSX153.ger.corp.intel.com ([169.254.9.140]) with mapi id 14.03.0224.002; Thu, 2 Jul 2015 18:07:59 +0100 From: "Ananyev, Konstantin" To: Zoltan Kiss , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v2] mempool: improve cache search Thread-Index: AQHQs9zucyVAGh5CT0u3owUiz91FAp3IZRhg Date: Thu, 2 Jul 2015 17:07:59 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836A21C32@irsmsx105.ger.corp.intel.com> References: <1435258110-17140-1-git-send-email-zoltan.kiss@linaro.org> <1435741430-2088-1-git-send-email-zoltan.kiss@linaro.org> In-Reply-To: <1435741430-2088-1-git-send-email-zoltan.kiss@linaro.org> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2] mempool: improve cache search X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jul 2015 17:08:04 -0000 > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss > Sent: Wednesday, July 01, 2015 10:04 AM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH v2] mempool: improve cache search >=20 > The current way has a few problems: >=20 > - if cache->len < n, we copy our elements into the cache first, then > into obj_table, that's unnecessary > - if n >=3D cache_size (or the backfill fails), and we can't fulfil the > request from the ring alone, we don't try to combine with the cache > - if refill fails, we don't return anything, even if the ring has enough > for our request >=20 > This patch rewrites it severely: > - at the first part of the function we only try the cache if cache->len <= n > - otherwise take our elements straight from the ring > - if that fails but we have something in the cache, try to combine them > - the refill happens at the end, and its failure doesn't modify our retur= n > value >=20 > Signed-off-by: Zoltan Kiss > --- > v2: > - fix subject > - add unlikely for branch where request is fulfilled both from cache and = ring >=20 > lib/librte_mempool/rte_mempool.h | 63 +++++++++++++++++++++++++---------= ------ > 1 file changed, 39 insertions(+), 24 deletions(-) >=20 > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_me= mpool.h > index 6d4ce9a..1e96f03 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -947,34 +947,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void **o= bj_table, > unsigned lcore_id =3D rte_lcore_id(); > uint32_t cache_size =3D mp->cache_size; >=20 > - /* cache is not enabled or single consumer */ > + cache =3D &mp->local_cache[lcore_id]; > + /* cache is not enabled or single consumer or not enough */ > if (unlikely(cache_size =3D=3D 0 || is_mc =3D=3D 0 || > - n >=3D cache_size || lcore_id >=3D RTE_MAX_LCORE)) > + cache->len < n || lcore_id >=3D RTE_MAX_LCORE)) > goto ring_dequeue; >=20 > - cache =3D &mp->local_cache[lcore_id]; > cache_objs =3D cache->objs; >=20 > - /* Can this be satisfied from the cache? */ > - if (cache->len < n) { > - /* No. Backfill the cache first, and then fill from it */ > - uint32_t req =3D n + (cache_size - cache->len); > - > - /* How many do we require i.e. number to fill the cache + the request = */ > - ret =3D rte_ring_mc_dequeue_bulk(mp->ring, &cache->objs[cache->len], r= eq); > - if (unlikely(ret < 0)) { > - /* > - * In the offchance that we are buffer constrained, > - * where we are not able to allocate cache + n, go to > - * the ring directly. If that fails, we are truly out of > - * buffers. > - */ > - goto ring_dequeue; > - } > - > - cache->len +=3D req; > - } > - > /* Now fill in the response ... */ > for (index =3D 0, len =3D cache->len - 1; index < n; ++index, len--, ob= j_table++) > *obj_table =3D cache_objs[len]; > @@ -983,7 +963,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj= _table, >=20 > __MEMPOOL_STAT_ADD(mp, get_success, n); >=20 > - return 0; > + ret =3D 0; > + goto cache_refill; >=20 > ring_dequeue: > #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > @@ -994,11 +975,45 @@ ring_dequeue: > else > ret =3D rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n); >=20 > +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > + if (unlikely(ret < 0 && is_mc =3D=3D 1 && cache->len > 0)) { > + uint32_t req =3D n - cache->len; > + > + ret =3D rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req); > + if (ret =3D=3D 0) { > + cache_objs =3D cache->objs; > + obj_table +=3D req; > + for (index =3D 0; index < cache->len; > + ++index, ++obj_table) > + *obj_table =3D cache_objs[index]; > + cache->len =3D 0; > + } > + } > +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > + > if (ret < 0) > __MEMPOOL_STAT_ADD(mp, get_fail, n); > else > __MEMPOOL_STAT_ADD(mp, get_success, n); >=20 > +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > +cache_refill: Ok, so if I get things right: if the lcore runs out of entries in cache, then on next __mempool_get_bulk() it has to do ring_dequeue() twice: 1. to satisfy user request 2. to refill the cache. Right? If that so, then I think the current approach: ring_dequeue() once to refill the cache, then copy entries from the cache t= o the user is a cheaper(faster) one for many cases. Especially when same pool is shared between multiple threads. For example when thread is doing RX only (no TX). > + /* If previous dequeue was OK and we have less than n, start refill */ > + if (ret =3D=3D 0 && cache_size > 0 && cache->len < n) { > + uint32_t req =3D cache_size - cache->len; It could be that n > cache_size. For that case, there probably no point to refill the cache, as you took ent= rires from the ring and cache was intact.=20 Konstantin > + > + cache_objs =3D cache->objs; > + ret =3D rte_ring_mc_dequeue_bulk(mp->ring, > + &cache->objs[cache->len], > + req); > + if (likely(ret =3D=3D 0)) > + cache->len +=3D req; > + else > + /* Don't spoil the return value */ > + ret =3D 0; > + } > +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > + > return ret; > } >=20 > -- > 1.9.1