From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 24BF35585
 for <dev@dpdk.org>; Mon, 18 Apr 2016 15:17:05 +0200 (CEST)
Received: from orsmga002.jf.intel.com ([10.7.209.21])
 by fmsmga101.fm.intel.com with ESMTP; 18 Apr 2016 06:17:06 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.24,502,1455004800"; d="scan'208";a="957436606"
Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23])
 by orsmga002.jf.intel.com with ESMTP; 18 Apr 2016 06:17:04 -0700
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.35]) by
 IRSMSX109.ger.corp.intel.com ([169.254.13.57]) with mapi id 14.03.0248.002;
 Mon, 18 Apr 2016 14:17:02 +0100
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Lazaros Koromilas <l@nofutznetworks.com>, "dev@dpdk.org" <dev@dpdk.org>
Thread-Topic: [dpdk-dev] [PATCH v2 1/2] mempool: allow for user-owned
 mempool	caches
Thread-Index: AQHRjojXT3xh8V+gNkyef6QCaJ8dzJ+PuqTg
Date: Mon, 18 Apr 2016 13:17:02 +0000
Message-ID: <2601191342CEEE43887BDE71AB97725836B3F4BA@irsmsx105.ger.corp.intel.com>
References: <1459784610-14008-1-git-send-email-l@nofutznetworks.com>
In-Reply-To: <1459784610-14008-1-git-send-email-l@nofutznetworks.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMThmZTI5MzctOWM1OC00N2Q0LWJiOWItNGY2ZjE0YjM5MGRhIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6ImFaSE9mXC9hSlBXdUFId0JXVjVrMU1lVjBLYjFscVNvTVlQY3NiQXRCT01rPSJ9
x-ctpclassification: CTP_IC
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [PATCH v2 1/2] mempool: allow for user-owned
	mempool	caches
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Apr 2016 13:17:07 -0000

Hi Lazaros,

Looks ok to me in general, few comments below.
One more generic question - did you observe any performance impact=20
caused by these changes?
Konstantin

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Lazaros Koromilas
> Sent: Monday, April 04, 2016 4:43 PM
> To: dev@dpdk.org
> Subject: [dpdk-dev] [PATCH v2 1/2] mempool: allow for user-owned mempool =
caches
>=20
> The mempool cache is only available to EAL threads as a per-lcore
> resource. Change this so that the user can create and provide their own
> cache on mempool get and put operations. This works with non-EAL threads
> too. This commit introduces the new API calls:
>=20
>     rte_mempool_cache_create(size, socket_id)
>     rte_mempool_cache_flush(cache, mp)
>     rte_mempool_cache_free(cache)
>     rte_mempool_default_cache(mp, lcore_id)
>     rte_mempool_generic_put(mp, obj_table, n, cache, is_mp)
>     rte_mempool_generic_get(mp, obj_table, n, cache, is_mc)
>=20
> Removes the API calls:
>=20
>     rte_mempool_sp_put_bulk(mp, obj_table, n)
>     rte_mempool_sc_get_bulk(mp, obj_table, n)
>     rte_mempool_sp_put(mp, obj)
>     rte_mempool_sc_get(mp, obj)


Hmm, shouldn't we deprecate it first for a release before removing complete=
ly?
Let say for now you can just make them macros that calls the remaining func=
tions or so.

>=20
> And the remaining API calls use the per-lcore default local cache:
>=20
>     rte_mempool_put_bulk(mp, obj_table, n)
>     rte_mempool_get_bulk(mp, obj_table, n)
>     rte_mempool_put(mp, obj)
>     rte_mempool_get(mp, obj)
>=20
> Signed-off-by: Lazaros Koromilas <l@nofutznetworks.com>
> ---
>  app/test/test_mempool.c                |  58 +++++--
>  app/test/test_mempool_perf.c           |  46 +++++-
>  lib/librte_eal/common/eal_common_log.c |   8 +-
>  lib/librte_mempool/rte_mempool.c       |  76 ++++++++-
>  lib/librte_mempool/rte_mempool.h       | 291 +++++++++++++--------------=
------
>  5 files changed, 275 insertions(+), 204 deletions(-)
>=20
>=20
> diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_me=
mpool.c
> index 73ca770..4d977c1 100644
> --- a/lib/librte_mempool/rte_mempool.c
> +++ b/lib/librte_mempool/rte_mempool.c
> @@ -375,6 +375,63 @@ rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num=
, size_t elt_sz,
>  	return usz;
>  }
>=20
> +static void
> +mempool_cache_init(struct rte_mempool_cache *cache, uint32_t size)
> +{
> +	cache->size =3D size;
> +	cache->flushthresh =3D CALC_CACHE_FLUSHTHRESH(size);
> +	cache->len =3D 0;
> +}
> +
> +/*
> + * Create and initialize a cache for objects that are retrieved from and
> + * returned to an underlying mempool. This structure is identical to the
> + * local_cache[lcore_id] pointed to by the mempool structure.
> + */
> +struct rte_mempool_cache *
> +rte_mempool_cache_create(uint32_t size, int socket_id)
> +{
> +	struct rte_mempool_cache *cache;
> +
> +	if (size > RTE_MEMPOOL_CACHE_MAX_SIZE) {
> +		rte_errno =3D EINVAL;
> +		return NULL;
> +	}
> +
> +	cache =3D rte_zmalloc_socket("MEMPOOL_CACHE", sizeof(*cache),
> +				   RTE_CACHE_LINE_SIZE, socket_id);
> +	if (cache =3D=3D NULL) {
> +		RTE_LOG(ERR, MEMPOOL, "Cannot allocate mempool cache!\n");
> +		rte_errno =3D ENOMEM;
> +		return NULL;
> +	}
> +
> +	mempool_cache_init(cache, size);
> +
> +	return cache;
> +}
> +
> +/*
> + * Free a cache. It's the responsibility of the user to make sure that a=
ny
> + * remaining objects in the cache are flushed to the corresponding
> + * mempool.
> + */
> +void
> +rte_mempool_cache_free(struct rte_mempool_cache *cache)
> +{
> +	rte_free(cache);
> +}
> +
> +/*
> + * Put all objects in the cache to the specified mempool's ring.
> + */
> +void
> +rte_mempool_cache_flush(struct rte_mempool_cache *cache,
> +			struct rte_mempool *mp)
> +{
> +	rte_ring_enqueue_bulk(mp->ring, cache->objs, cache->len);

Shouldn't you also reset cache->len too here?
cache->len =3D 0;
Another thought - might be that function deserved to be inline one.

> +}
> +
>  #ifndef RTE_LIBRTE_XEN_DOM0
>  /* stub if DOM0 support not configured */
>  struct rte_mempool *
> @@ -448,6 +505,7 @@ rte_mempool_xmem_create(const char *name, unsigned n,=
 unsigned elt_size,
>  	struct rte_mempool_objsz objsz;
>  	void *startaddr;
>  	int page_size =3D getpagesize();
> +	unsigned lcore_id;
>=20
>  	/* compilation-time checks */
>  	RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) &
> @@ -583,8 +641,8 @@ rte_mempool_xmem_create(const char *name, unsigned n,=
 unsigned elt_size,
>  	mp->elt_size =3D objsz.elt_size;
>  	mp->header_size =3D objsz.header_size;
>  	mp->trailer_size =3D objsz.trailer_size;
> +	/* Size of default caches, zero means disabled. */
>  	mp->cache_size =3D cache_size;
> -	mp->cache_flushthresh =3D CALC_CACHE_FLUSHTHRESH(cache_size);
>  	mp->private_data_size =3D private_data_size;
>=20
>  	/*
> @@ -594,6 +652,13 @@ rte_mempool_xmem_create(const char *name, unsigned n=
, unsigned elt_size,
>  	mp->local_cache =3D (struct rte_mempool_cache *)
>  			((char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, 0));
>=20
> +	/* Init all default caches. */
> +	if (cache_size !=3D 0) {
> +		for (lcore_id =3D 0; lcore_id < RTE_MAX_LCORE; lcore_id++)
> +			mempool_cache_init(&mp->local_cache[lcore_id],
> +					   cache_size);
> +	}
> +
>  	/* calculate address of the first element for continuous mempool. */
>  	obj =3D (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size) +
>  		private_data_size;
> @@ -672,7 +737,7 @@ rte_mempool_dump_cache(FILE *f, const struct rte_memp=
ool *mp)
>  	unsigned count =3D 0;
>  	unsigned cache_count;
>=20
> -	fprintf(f, "  cache infos:\n");
> +	fprintf(f, "  internal cache infos:\n");
>  	fprintf(f, "    cache_size=3D%"PRIu32"\n", mp->cache_size);
>=20
>  	if (mp->cache_size =3D=3D 0)
> @@ -680,7 +745,8 @@ rte_mempool_dump_cache(FILE *f, const struct rte_memp=
ool *mp)
>=20
>  	for (lcore_id =3D 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
>  		cache_count =3D mp->local_cache[lcore_id].len;
> -		fprintf(f, "    cache_count[%u]=3D%u\n", lcore_id, cache_count);
> +		fprintf(f, "    cache_count[%u]=3D%"PRIu32"\n",
> +			lcore_id, cache_count);
>  		count +=3D cache_count;
>  	}
>  	fprintf(f, "    total_cache_count=3D%u\n", count);
> @@ -760,7 +826,9 @@ mempool_audit_cache(const struct rte_mempool *mp)
>  		return;
>=20
>  	for (lcore_id =3D 0; lcore_id < RTE_MAX_LCORE; lcore_id++) {
> -		if (mp->local_cache[lcore_id].len > mp->cache_flushthresh) {
> +		const struct rte_mempool_cache *cache;
> +		cache =3D &mp->local_cache[lcore_id];
> +		if (cache->len > cache->flushthresh) {
>  			RTE_LOG(CRIT, MEMPOOL, "badness on cache[%u]\n",
>  				lcore_id);
>  			rte_panic("MEMPOOL: invalid cache len\n");
> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_me=
mpool.h
> index 8595e77..21d43e2 100644
> --- a/lib/librte_mempool/rte_mempool.h
> +++ b/lib/librte_mempool/rte_mempool.h
> @@ -99,7 +99,9 @@ struct rte_mempool_debug_stats {
>   * A structure that stores a per-core object cache.
>   */
>  struct rte_mempool_cache {
> -	unsigned len; /**< Cache len */
> +	uint32_t size;        /**< Size of the cache */
> +	uint32_t flushthresh; /**< Threshold before we flush excess elements */
> +	uint32_t len;         /**< Current cache count */
>  	/*
>  	 * Cache is allocated to this size to allow it to overflow in certain
>  	 * cases to avoid needless emptying of cache.
> @@ -182,9 +184,7 @@ struct rte_mempool {
>  	phys_addr_t phys_addr;           /**< Phys. addr. of mempool struct. */
>  	int flags;                       /**< Flags of the mempool. */
>  	uint32_t size;                   /**< Size of the mempool. */
> -	uint32_t cache_size;             /**< Size of per-lcore local cache. */
> -	uint32_t cache_flushthresh;
> -	/**< Threshold before we flush excess elements. */
> +	uint32_t cache_size;             /**< Size of per-lcore default local c=
ache. */
>=20
>  	uint32_t elt_size;               /**< Size of an element. */
>  	uint32_t header_size;            /**< Size of header (before elt). */
> @@ -742,6 +742,66 @@ rte_dom0_mempool_create(const char *name, unsigned n=
, unsigned elt_size,
>  void rte_mempool_dump(FILE *f, const struct rte_mempool *mp);
>=20
>  /**
> + * Create a user-owned mempool cache.
> + *
> + * This can be used by non-EAL threads to enable caching when they
> + * interact with a mempool.
> + *
> + * @param size
> + *   The size of the mempool cache. See rte_mempool_create()'s cache_siz=
e
> + *   parameter description for more information. The same limits and
> + *   considerations apply here too.
> + * @param socket_id
> + *   The socket identifier in the case of NUMA. The value can be
> + *   SOCKET_ID_ANY if there is no NUMA constraint for the reserved zone.
> + */
> +struct rte_mempool_cache *
> +rte_mempool_cache_create(uint32_t size, int socket_id);
> +
> +/**
> + * Free a user-owned mempool cache.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + */
> +void
> +rte_mempool_cache_free(struct rte_mempool_cache *cache);
> +
> +/**
> + * Flush a user-owned mempool cache to the specified mempool.
> + *
> + * @param cache
> + *   A pointer to the mempool cache.
> + * @param mp
> + *   A pointer to the mempool.
> + */
> +void
> +rte_mempool_cache_flush(struct rte_mempool_cache *cache,
> +			struct rte_mempool *mp);
> +
> +/**
> + * Get a pointer to the per-lcore default mempool cache.
> + *
> + * @param mp
> + *   A pointer to the mempool structure.
> + * @param lcore_id
> + *   The logical core id.
> + * @return
> + *   A pointer to the mempool cache or NULL if disabled or non-EAL threa=
d.
> + */
> +static inline struct rte_mempool_cache *
> +rte_mempool_default_cache(struct rte_mempool *mp, unsigned lcore_id)
> +{
> +	if (mp->cache_size =3D=3D 0)
> +		return NULL;
> +
> +	if (lcore_id >=3D RTE_MAX_LCORE)
> +		return NULL;
> +
> +	return &mp->local_cache[lcore_id];
> +}
> +
> +/**
>   * @internal Put several objects back in the mempool; used internally.
>   * @param mp
>   *   A pointer to the mempool structure.
> @@ -750,33 +810,29 @@ void rte_mempool_dump(FILE *f, const struct rte_mem=
pool *mp);
>   * @param n
>   *   The number of objects to store back in the mempool, must be strictl=
y
>   *   positive.
> + * @param cache
> + *   A pointer to a mempool cache structure. May be NULL if not needed.
>   * @param is_mp
>   *   Mono-producer (0) or multi-producers (1).
>   */
>  static inline void __attribute__((always_inline))
> -__mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table,
> -		    unsigned n, int is_mp)
> +__mempool_generic_put(struct rte_mempool *mp, void * const *obj_table,
> +		      unsigned n, struct rte_mempool_cache *cache, int is_mp)
>  {
> -	struct rte_mempool_cache *cache;
>  	uint32_t index;
>  	void **cache_objs;
> -	unsigned lcore_id =3D rte_lcore_id();
> -	uint32_t cache_size =3D mp->cache_size;
> -	uint32_t flushthresh =3D mp->cache_flushthresh;
>=20
>  	/* increment stat now, adding in mempool always success */
>  	__MEMPOOL_STAT_ADD(mp, put, n);
>=20
> -	/* cache is not enabled or single producer or non-EAL thread */
> -	if (unlikely(cache_size =3D=3D 0 || is_mp =3D=3D 0 ||
> -		     lcore_id >=3D RTE_MAX_LCORE))
> +	/* No cache provided or cache is not enabled or single producer */
> +	if (unlikely(cache =3D=3D NULL || cache->size =3D=3D 0 || is_mp =3D=3D =
0))

Here and in other places do we really need that check: 'cache->size =3D=3D =
0'?
I mean at mempool_create() if mp->cache_size would be zero, then we wouldn'=
t allocate
any local_cache[] ant all and cache would be 0 anyway.
Also in rte_mempool_cache_create() we can check that size > 0 and return -E=
INVAL otherwise.

>  		goto ring_enqueue;
>=20
>  	/* Go straight to ring if put would overflow mem allocated for cache */
>  	if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE))
>  		goto ring_enqueue;
>=20
> -	cache =3D &mp->local_cache[lcore_id];
>  	cache_objs =3D &cache->objs[cache->len];
>=20
>  	/*
> @@ -792,10 +848,10 @@ __mempool_put_bulk(struct rte_mempool *mp, void * c=
onst *obj_table,
>=20
>  	cache->len +=3D n;
>=20
> -	if (cache->len >=3D flushthresh) {
> -		rte_ring_mp_enqueue_bulk(mp->ring, &cache->objs[cache_size],
> -				cache->len - cache_size);
> -		cache->len =3D cache_size;
> +	if (cache->len >=3D cache->flushthresh) {
> +		rte_ring_mp_enqueue_bulk(mp->ring, &cache->objs[cache->size],
> +				cache->len - cache->size);
> +		cache->len =3D cache->size;
>  	}
>=20
>  	return;
> @@ -820,41 +876,27 @@ ring_enqueue:
>  #endif
>  }
>=20