From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga14.intel.com (mga14.intel.com [192.55.52.115])
 by dpdk.org (Postfix) with ESMTP id CA7B711F5
 for <dev@dpdk.org>; Thu,  2 Jul 2015 19:08:02 +0200 (CEST)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by fmsmga103.fm.intel.com with ESMTP; 02 Jul 2015 10:08:01 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.15,394,1432623600"; d="scan'208";a="721870050"
Received: from irsmsx153.ger.corp.intel.com ([163.33.192.75])
 by orsmga001.jf.intel.com with ESMTP; 02 Jul 2015 10:08:00 -0700
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.245]) by
 IRSMSX153.ger.corp.intel.com ([169.254.9.140]) with mapi id 14.03.0224.002;
 Thu, 2 Jul 2015 18:07:59 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Zoltan Kiss <zoltan.kiss@linaro.org>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v2] mempool: improve cache search
Thread-Index: AQHQs9zucyVAGh5CT0u3owUiz91FAp3IZRhg
Date: Thu, 2 Jul 2015 17:07:59 +0000
Message-ID: <2601191342CEEE43887BDE71AB97725836A21C32@irsmsx105.ger.corp.intel.com>
References: <1435258110-17140-1-git-send-email-zoltan.kiss@linaro.org>
 <1435741430-2088-1-git-send-email-zoltan.kiss@linaro.org>
In-Reply-To: <1435741430-2088-1-git-send-email-zoltan.kiss@linaro.org>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v2] mempool: improve cache search
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Jul 2015 17:08:04 -0000


> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss
> Sent: Wednesday, July 01, 2015 10:04 AM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2] mempool: improve cache search
>=20
> The current way has a few problems:
>=20
> - if cache->len < n, we copy our elements into the cache first, then
>   into obj_table, that's unnecessary
> - if n >=3D cache_size (or the backfill fails), and we can't fulfil the
>   request from the ring alone, we don't try to combine with the cache
> - if refill fails, we don't return anything, even if the ring has enough
>   for our request
>=20
> This patch rewrites it severely:
> - at the first part of the function we only try the cache if cache->len <=
 n
> - otherwise take our elements straight from the ring
> - if that fails but we have something in the cache, try to combine them
> - the refill happens at the end, and its failure doesn't modify our retur=
n
>   value
>=20
> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
> ---
> v2:
> - fix subject
> - add unlikely for branch where request is fulfilled both from cache and =
ring
>=20
>  lib/librte_mempool/rte_mempool.h | 63 +++++++++++++++++++++++++---------=
------
>  1 file changed, 39 insertions(+), 24 deletions(-)
>=20
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_me=
mpool.h
> index 6d4ce9a..1e96f03 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -947,34 +947,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void **o=
bj_table,
>  	unsigned lcore_id =3D rte_lcore_id();
>  	uint32_t cache_size =3D mp->cache_size;
>=20
> -	/* cache is not enabled or single consumer */
> +	cache =3D &mp->local_cache[lcore_id];
> +	/* cache is not enabled or single consumer or not enough */
>  	if (unlikely(cache_size =3D=3D 0 || is_mc =3D=3D 0 ||
> -		     n >=3D cache_size || lcore_id >=3D RTE_MAX_LCORE))
> +		     cache->len < n || lcore_id >=3D RTE_MAX_LCORE))
>  		goto ring_dequeue;
>=20
> -	cache =3D &mp->local_cache[lcore_id];
>  	cache_objs =3D cache->objs;
>=20
> -	/* Can this be satisfied from the cache? */
> -	if (cache->len < n) {
> -		/* No. Backfill the cache first, and then fill from it */
> -		uint32_t req =3D n + (cache_size - cache->len);
> -
> -		/* How many do we require i.e. number to fill the cache + the request =
*/
> -		ret =3D rte_ring_mc_dequeue_bulk(mp->ring, &cache->objs[cache->len], r=
eq);
> -		if (unlikely(ret < 0)) {
> -			/*
> -			 * In the offchance that we are buffer constrained,
> -			 * where we are not able to allocate cache + n, go to
> -			 * the ring directly. If that fails, we are truly out of
> -			 * buffers.
> -			 */
> -			goto ring_dequeue;
> -		}
> -
> -		cache->len +=3D req;
> -	}
> -
>  	/* Now fill in the response ... */
>  	for (index =3D 0, len =3D cache->len - 1; index < n; ++index, len--, ob=
j_table++)
>  		*obj_table =3D cache_objs[len];
> @@ -983,7 +963,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj=
_table,
>=20
>  	__MEMPOOL_STAT_ADD(mp, get_success, n);
>=20
> -	return 0;
> +	ret =3D 0;
> +	goto cache_refill;
>=20
>  ring_dequeue:
>  #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
> @@ -994,11 +975,45 @@ ring_dequeue:
>  	else
>  		ret =3D rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n);
>=20
> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> +	if (unlikely(ret < 0 && is_mc =3D=3D 1 && cache->len > 0)) {
> +		uint32_t req =3D n - cache->len;
> +
> +		ret =3D rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req);
> +		if (ret =3D=3D 0) {
> +			cache_objs =3D cache->objs;
> +			obj_table +=3D req;
> +			for (index =3D 0; index < cache->len;
> +			     ++index, ++obj_table)
> +				*obj_table =3D cache_objs[index];
> +			cache->len =3D 0;
> +		}
> +	}
> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
> +
>  	if (ret < 0)
>  		__MEMPOOL_STAT_ADD(mp, get_fail, n);
>  	else
>  		__MEMPOOL_STAT_ADD(mp, get_success, n);
>=20
> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0
> +cache_refill:

Ok, so if I get things right: if the lcore runs out of entries in cache,
then on next __mempool_get_bulk() it has to do ring_dequeue() twice:
1. to satisfy user request
2. to refill the cache.
Right?
If that so, then I think the current approach:
ring_dequeue() once to refill the cache, then copy entries from the cache t=
o the user
is a cheaper(faster) one for many cases.
Especially when same pool is shared between multiple threads.
For example when thread is doing RX only (no TX).


> +	/* If previous dequeue was OK and we have less than n, start refill */
> +	if (ret =3D=3D 0 && cache_size > 0 && cache->len < n) {
> +		uint32_t req =3D cache_size - cache->len;


It could be that n > cache_size.
For that case, there probably no point to refill the cache, as you took ent=
rires from the ring
and cache was intact.=20

Konstantin

> +
> +		cache_objs =3D cache->objs;
> +		ret =3D rte_ring_mc_dequeue_bulk(mp->ring,
> +					       &cache->objs[cache->len],
> +					       req);
> +		if (likely(ret =3D=3D 0))
> +			cache->len +=3D req;
> +		else
> +			/* Don't spoil the return value */
> +			ret =3D 0;
> +	}
> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */
> +
>  	return ret;
>  }
>=20
> --
> 1.9.1