From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 24BF35585 for ; Mon, 18 Apr 2016 15:17:05 +0200 (CEST) Received: from orsmga002.jf.intel.com ([10.7.209.21]) by fmsmga101.fm.intel.com with ESMTP; 18 Apr 2016 06:17:06 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,502,1455004800"; d="scan'208";a="957436606" Received: from irsmsx109.ger.corp.intel.com ([163.33.3.23]) by orsmga002.jf.intel.com with ESMTP; 18 Apr 2016 06:17:04 -0700 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.35]) by IRSMSX109.ger.corp.intel.com ([169.254.13.57]) with mapi id 14.03.0248.002; Mon, 18 Apr 2016 14:17:02 +0100 From: "Ananyev, Konstantin" To: Lazaros Koromilas , "dev@dpdk.org" Thread-Topic: [dpdk-dev] [PATCH v2 1/2] mempool: allow for user-owned mempool caches Thread-Index: AQHRjojXT3xh8V+gNkyef6QCaJ8dzJ+PuqTg Date: Mon, 18 Apr 2016 13:17:02 +0000 Message-ID: <2601191342CEEE43887BDE71AB97725836B3F4BA@irsmsx105.ger.corp.intel.com> References: <1459784610-14008-1-git-send-email-l@nofutznetworks.com> In-Reply-To: <1459784610-14008-1-git-send-email-l@nofutznetworks.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiMThmZTI5MzctOWM1OC00N2Q0LWJiOWItNGY2ZjE0YjM5MGRhIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE1LjkuNi42IiwiVHJ1c3RlZExhYmVsSGFzaCI6ImFaSE9mXC9hSlBXdUFId0JXVjVrMU1lVjBLYjFscVNvTVlQY3NiQXRCT01rPSJ9 x-ctpclassification: CTP_IC x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH v2 1/2] mempool: allow for user-owned mempool caches X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Apr 2016 13:17:07 -0000 Hi Lazaros, Looks ok to me in general, few comments below. One more generic question - did you observe any performance impact=20 caused by these changes? Konstantin > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Lazaros Koromilas > Sent: Monday, April 04, 2016 4:43 PM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH v2 1/2] mempool: allow for user-owned mempool = caches >=20 > The mempool cache is only available to EAL threads as a per-lcore > resource. Change this so that the user can create and provide their own > cache on mempool get and put operations. This works with non-EAL threads > too. This commit introduces the new API calls: >=20 > rte_mempool_cache_create(size, socket_id) > rte_mempool_cache_flush(cache, mp) > rte_mempool_cache_free(cache) > rte_mempool_default_cache(mp, lcore_id) > rte_mempool_generic_put(mp, obj_table, n, cache, is_mp) > rte_mempool_generic_get(mp, obj_table, n, cache, is_mc) >=20 > Removes the API calls: >=20 > rte_mempool_sp_put_bulk(mp, obj_table, n) > rte_mempool_sc_get_bulk(mp, obj_table, n) > rte_mempool_sp_put(mp, obj) > rte_mempool_sc_get(mp, obj) Hmm, shouldn't we deprecate it first for a release before removing complete= ly? Let say for now you can just make them macros that calls the remaining func= tions or so. >=20 > And the remaining API calls use the per-lcore default local cache: >=20 > rte_mempool_put_bulk(mp, obj_table, n) > rte_mempool_get_bulk(mp, obj_table, n) > rte_mempool_put(mp, obj) > rte_mempool_get(mp, obj) >=20 > Signed-off-by: Lazaros Koromilas > --- > app/test/test_mempool.c | 58 +++++-- > app/test/test_mempool_perf.c | 46 +++++- > lib/librte_eal/common/eal_common_log.c | 8 +- > lib/librte_mempool/rte_mempool.c | 76 ++++++++- > lib/librte_mempool/rte_mempool.h | 291 +++++++++++++--------------= ------ > 5 files changed, 275 insertions(+), 204 deletions(-) >=20 >=20 > diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_me= mpool.c > index 73ca770..4d977c1 100644 > --- a/lib/librte_mempool/rte_mempool.c > +++ b/lib/librte_mempool/rte_mempool.c > @@ -375,6 +375,63 @@ rte_mempool_xmem_usage(void *vaddr, uint32_t elt_num= , size_t elt_sz, > return usz; > } >=20 > +static void > +mempool_cache_init(struct rte_mempool_cache *cache, uint32_t size) > +{ > + cache->size =3D size; > + cache->flushthresh =3D CALC_CACHE_FLUSHTHRESH(size); > + cache->len =3D 0; > +} > + > +/* > + * Create and initialize a cache for objects that are retrieved from and > + * returned to an underlying mempool. This structure is identical to the > + * local_cache[lcore_id] pointed to by the mempool structure. > + */ > +struct rte_mempool_cache * > +rte_mempool_cache_create(uint32_t size, int socket_id) > +{ > + struct rte_mempool_cache *cache; > + > + if (size > RTE_MEMPOOL_CACHE_MAX_SIZE) { > + rte_errno =3D EINVAL; > + return NULL; > + } > + > + cache =3D rte_zmalloc_socket("MEMPOOL_CACHE", sizeof(*cache), > + RTE_CACHE_LINE_SIZE, socket_id); > + if (cache =3D=3D NULL) { > + RTE_LOG(ERR, MEMPOOL, "Cannot allocate mempool cache!\n"); > + rte_errno =3D ENOMEM; > + return NULL; > + } > + > + mempool_cache_init(cache, size); > + > + return cache; > +} > + > +/* > + * Free a cache. It's the responsibility of the user to make sure that a= ny > + * remaining objects in the cache are flushed to the corresponding > + * mempool. > + */ > +void > +rte_mempool_cache_free(struct rte_mempool_cache *cache) > +{ > + rte_free(cache); > +} > + > +/* > + * Put all objects in the cache to the specified mempool's ring. > + */ > +void > +rte_mempool_cache_flush(struct rte_mempool_cache *cache, > + struct rte_mempool *mp) > +{ > + rte_ring_enqueue_bulk(mp->ring, cache->objs, cache->len); Shouldn't you also reset cache->len too here? cache->len =3D 0; Another thought - might be that function deserved to be inline one. > +} > + > #ifndef RTE_LIBRTE_XEN_DOM0 > /* stub if DOM0 support not configured */ > struct rte_mempool * > @@ -448,6 +505,7 @@ rte_mempool_xmem_create(const char *name, unsigned n,= unsigned elt_size, > struct rte_mempool_objsz objsz; > void *startaddr; > int page_size =3D getpagesize(); > + unsigned lcore_id; >=20 > /* compilation-time checks */ > RTE_BUILD_BUG_ON((sizeof(struct rte_mempool) & > @@ -583,8 +641,8 @@ rte_mempool_xmem_create(const char *name, unsigned n,= unsigned elt_size, > mp->elt_size =3D objsz.elt_size; > mp->header_size =3D objsz.header_size; > mp->trailer_size =3D objsz.trailer_size; > + /* Size of default caches, zero means disabled. */ > mp->cache_size =3D cache_size; > - mp->cache_flushthresh =3D CALC_CACHE_FLUSHTHRESH(cache_size); > mp->private_data_size =3D private_data_size; >=20 > /* > @@ -594,6 +652,13 @@ rte_mempool_xmem_create(const char *name, unsigned n= , unsigned elt_size, > mp->local_cache =3D (struct rte_mempool_cache *) > ((char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, 0)); >=20 > + /* Init all default caches. */ > + if (cache_size !=3D 0) { > + for (lcore_id =3D 0; lcore_id < RTE_MAX_LCORE; lcore_id++) > + mempool_cache_init(&mp->local_cache[lcore_id], > + cache_size); > + } > + > /* calculate address of the first element for continuous mempool. */ > obj =3D (char *)mp + MEMPOOL_HEADER_SIZE(mp, pg_num, cache_size) + > private_data_size; > @@ -672,7 +737,7 @@ rte_mempool_dump_cache(FILE *f, const struct rte_memp= ool *mp) > unsigned count =3D 0; > unsigned cache_count; >=20 > - fprintf(f, " cache infos:\n"); > + fprintf(f, " internal cache infos:\n"); > fprintf(f, " cache_size=3D%"PRIu32"\n", mp->cache_size); >=20 > if (mp->cache_size =3D=3D 0) > @@ -680,7 +745,8 @@ rte_mempool_dump_cache(FILE *f, const struct rte_memp= ool *mp) >=20 > for (lcore_id =3D 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { > cache_count =3D mp->local_cache[lcore_id].len; > - fprintf(f, " cache_count[%u]=3D%u\n", lcore_id, cache_count); > + fprintf(f, " cache_count[%u]=3D%"PRIu32"\n", > + lcore_id, cache_count); > count +=3D cache_count; > } > fprintf(f, " total_cache_count=3D%u\n", count); > @@ -760,7 +826,9 @@ mempool_audit_cache(const struct rte_mempool *mp) > return; >=20 > for (lcore_id =3D 0; lcore_id < RTE_MAX_LCORE; lcore_id++) { > - if (mp->local_cache[lcore_id].len > mp->cache_flushthresh) { > + const struct rte_mempool_cache *cache; > + cache =3D &mp->local_cache[lcore_id]; > + if (cache->len > cache->flushthresh) { > RTE_LOG(CRIT, MEMPOOL, "badness on cache[%u]\n", > lcore_id); > rte_panic("MEMPOOL: invalid cache len\n"); > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_me= mpool.h > index 8595e77..21d43e2 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -99,7 +99,9 @@ struct rte_mempool_debug_stats { > * A structure that stores a per-core object cache. > */ > struct rte_mempool_cache { > - unsigned len; /**< Cache len */ > + uint32_t size; /**< Size of the cache */ > + uint32_t flushthresh; /**< Threshold before we flush excess elements */ > + uint32_t len; /**< Current cache count */ > /* > * Cache is allocated to this size to allow it to overflow in certain > * cases to avoid needless emptying of cache. > @@ -182,9 +184,7 @@ struct rte_mempool { > phys_addr_t phys_addr; /**< Phys. addr. of mempool struct. */ > int flags; /**< Flags of the mempool. */ > uint32_t size; /**< Size of the mempool. */ > - uint32_t cache_size; /**< Size of per-lcore local cache. */ > - uint32_t cache_flushthresh; > - /**< Threshold before we flush excess elements. */ > + uint32_t cache_size; /**< Size of per-lcore default local c= ache. */ >=20 > uint32_t elt_size; /**< Size of an element. */ > uint32_t header_size; /**< Size of header (before elt). */ > @@ -742,6 +742,66 @@ rte_dom0_mempool_create(const char *name, unsigned n= , unsigned elt_size, > void rte_mempool_dump(FILE *f, const struct rte_mempool *mp); >=20 > /** > + * Create a user-owned mempool cache. > + * > + * This can be used by non-EAL threads to enable caching when they > + * interact with a mempool. > + * > + * @param size > + * The size of the mempool cache. See rte_mempool_create()'s cache_siz= e > + * parameter description for more information. The same limits and > + * considerations apply here too. > + * @param socket_id > + * The socket identifier in the case of NUMA. The value can be > + * SOCKET_ID_ANY if there is no NUMA constraint for the reserved zone. > + */ > +struct rte_mempool_cache * > +rte_mempool_cache_create(uint32_t size, int socket_id); > + > +/** > + * Free a user-owned mempool cache. > + * > + * @param cache > + * A pointer to the mempool cache. > + */ > +void > +rte_mempool_cache_free(struct rte_mempool_cache *cache); > + > +/** > + * Flush a user-owned mempool cache to the specified mempool. > + * > + * @param cache > + * A pointer to the mempool cache. > + * @param mp > + * A pointer to the mempool. > + */ > +void > +rte_mempool_cache_flush(struct rte_mempool_cache *cache, > + struct rte_mempool *mp); > + > +/** > + * Get a pointer to the per-lcore default mempool cache. > + * > + * @param mp > + * A pointer to the mempool structure. > + * @param lcore_id > + * The logical core id. > + * @return > + * A pointer to the mempool cache or NULL if disabled or non-EAL threa= d. > + */ > +static inline struct rte_mempool_cache * > +rte_mempool_default_cache(struct rte_mempool *mp, unsigned lcore_id) > +{ > + if (mp->cache_size =3D=3D 0) > + return NULL; > + > + if (lcore_id >=3D RTE_MAX_LCORE) > + return NULL; > + > + return &mp->local_cache[lcore_id]; > +} > + > +/** > * @internal Put several objects back in the mempool; used internally. > * @param mp > * A pointer to the mempool structure. > @@ -750,33 +810,29 @@ void rte_mempool_dump(FILE *f, const struct rte_mem= pool *mp); > * @param n > * The number of objects to store back in the mempool, must be strictl= y > * positive. > + * @param cache > + * A pointer to a mempool cache structure. May be NULL if not needed. > * @param is_mp > * Mono-producer (0) or multi-producers (1). > */ > static inline void __attribute__((always_inline)) > -__mempool_put_bulk(struct rte_mempool *mp, void * const *obj_table, > - unsigned n, int is_mp) > +__mempool_generic_put(struct rte_mempool *mp, void * const *obj_table, > + unsigned n, struct rte_mempool_cache *cache, int is_mp) > { > - struct rte_mempool_cache *cache; > uint32_t index; > void **cache_objs; > - unsigned lcore_id =3D rte_lcore_id(); > - uint32_t cache_size =3D mp->cache_size; > - uint32_t flushthresh =3D mp->cache_flushthresh; >=20 > /* increment stat now, adding in mempool always success */ > __MEMPOOL_STAT_ADD(mp, put, n); >=20 > - /* cache is not enabled or single producer or non-EAL thread */ > - if (unlikely(cache_size =3D=3D 0 || is_mp =3D=3D 0 || > - lcore_id >=3D RTE_MAX_LCORE)) > + /* No cache provided or cache is not enabled or single producer */ > + if (unlikely(cache =3D=3D NULL || cache->size =3D=3D 0 || is_mp =3D=3D = 0)) Here and in other places do we really need that check: 'cache->size =3D=3D = 0'? I mean at mempool_create() if mp->cache_size would be zero, then we wouldn'= t allocate any local_cache[] ant all and cache would be 0 anyway. Also in rte_mempool_cache_create() we can check that size > 0 and return -E= INVAL otherwise. > goto ring_enqueue; >=20 > /* Go straight to ring if put would overflow mem allocated for cache */ > if (unlikely(n > RTE_MEMPOOL_CACHE_MAX_SIZE)) > goto ring_enqueue; >=20 > - cache =3D &mp->local_cache[lcore_id]; > cache_objs =3D &cache->objs[cache->len]; >=20 > /* > @@ -792,10 +848,10 @@ __mempool_put_bulk(struct rte_mempool *mp, void * c= onst *obj_table, >=20 > cache->len +=3D n; >=20 > - if (cache->len >=3D flushthresh) { > - rte_ring_mp_enqueue_bulk(mp->ring, &cache->objs[cache_size], > - cache->len - cache_size); > - cache->len =3D cache_size; > + if (cache->len >=3D cache->flushthresh) { > + rte_ring_mp_enqueue_bulk(mp->ring, &cache->objs[cache->size], > + cache->len - cache->size); > + cache->len =3D cache->size; > } >=20 > return; > @@ -820,41 +876,27 @@ ring_enqueue: > #endif > } >=20