* [dpdk-dev] [PATCH] mempool: improbe cache search @ 2015-06-25 18:48 Zoltan Kiss 2015-06-30 11:58 ` Olivier MATZ 2015-07-01 9:03 ` [dpdk-dev] [PATCH v2] mempool: improve " Zoltan Kiss 0 siblings, 2 replies; 10+ messages in thread From: Zoltan Kiss @ 2015-06-25 18:48 UTC (permalink / raw) To: dev The current way has a few problems: - if cache->len < n, we copy our elements into the cache first, then into obj_table, that's unnecessary - if n >= cache_size (or the backfill fails), and we can't fulfil the request from the ring alone, we don't try to combine with the cache - if refill fails, we don't return anything, even if the ring has enough for our request This patch rewrites it severely: - at the first part of the function we only try the cache if cache->len < n - otherwise take our elements straight from the ring - if that fails but we have something in the cache, try to combine them - the refill happens at the end, and its failure doesn't modify our return value Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> --- lib/librte_mempool/rte_mempool.h | 63 +++++++++++++++++++++++++--------------- 1 file changed, 39 insertions(+), 24 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index a8054e1..896946c 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -948,34 +948,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, unsigned lcore_id = rte_lcore_id(); uint32_t cache_size = mp->cache_size; - /* cache is not enabled or single consumer */ + cache = &mp->local_cache[lcore_id]; + /* cache is not enabled or single consumer or not enough */ if (unlikely(cache_size == 0 || is_mc == 0 || - n >= cache_size || lcore_id >= RTE_MAX_LCORE)) + cache->len < n || lcore_id >= RTE_MAX_LCORE)) goto ring_dequeue; - cache = &mp->local_cache[lcore_id]; cache_objs = cache->objs; - /* Can this be satisfied from the cache? */ - if (cache->len < n) { - /* No. Backfill the cache first, and then fill from it */ - uint32_t req = n + (cache_size - cache->len); - - /* How many do we require i.e. number to fill the cache + the request */ - ret = rte_ring_mc_dequeue_bulk(mp->ring, &cache->objs[cache->len], req); - if (unlikely(ret < 0)) { - /* - * In the offchance that we are buffer constrained, - * where we are not able to allocate cache + n, go to - * the ring directly. If that fails, we are truly out of - * buffers. - */ - goto ring_dequeue; - } - - cache->len += req; - } - /* Now fill in the response ... */ for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++) *obj_table = cache_objs[len]; @@ -984,7 +964,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, __MEMPOOL_STAT_ADD(mp, get_success, n); - return 0; + ret = 0; + goto cache_refill; ring_dequeue: #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ @@ -995,11 +976,45 @@ ring_dequeue: else ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n); +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 + if (ret < 0 && is_mc == 1 && cache->len > 0) { + uint32_t req = n - cache->len; + + ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req); + if (ret == 0) { + cache_objs = cache->objs; + obj_table += req; + for (index = 0; index < cache->len; + ++index, ++obj_table) + *obj_table = cache_objs[index]; + cache->len = 0; + } + } +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ + if (ret < 0) __MEMPOOL_STAT_ADD(mp, get_fail, n); else __MEMPOOL_STAT_ADD(mp, get_success, n); +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 +cache_refill: + /* If previous dequeue was OK and we have less than n, start refill */ + if (ret == 0 && cache_size > 0 && cache->len < n) { + uint32_t req = cache_size - cache->len; + + cache_objs = cache->objs; + ret = rte_ring_mc_dequeue_bulk(mp->ring, + &cache->objs[cache->len], + req); + if (likely(ret == 0)) + cache->len += req; + else + /* Don't spoil the return value */ + ret = 0; + } +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ + return ret; } -- 1.9.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] mempool: improbe cache search 2015-06-25 18:48 [dpdk-dev] [PATCH] mempool: improbe cache search Zoltan Kiss @ 2015-06-30 11:58 ` Olivier MATZ 2015-06-30 13:59 ` Zoltan Kiss 2015-07-01 9:03 ` [dpdk-dev] [PATCH v2] mempool: improve " Zoltan Kiss 1 sibling, 1 reply; 10+ messages in thread From: Olivier MATZ @ 2015-06-30 11:58 UTC (permalink / raw) To: Zoltan Kiss, dev Hi Zoltan, On 06/25/2015 08:48 PM, Zoltan Kiss wrote: > The current way has a few problems: > > - if cache->len < n, we copy our elements into the cache first, then > into obj_table, that's unnecessary > - if n >= cache_size (or the backfill fails), and we can't fulfil the > request from the ring alone, we don't try to combine with the cache > - if refill fails, we don't return anything, even if the ring has enough > for our request > > This patch rewrites it severely: > - at the first part of the function we only try the cache if cache->len < n > - otherwise take our elements straight from the ring > - if that fails but we have something in the cache, try to combine them > - the refill happens at the end, and its failure doesn't modify our return > value Indeed, it looks easier to read that way. I checked the performance with "mempool_perf_autotest" of app/test and it show that there is no regression (it is even slightly better in some test cases). There is a small typo in the title: s/improbe/improve Please see also a comment below. > > Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> > --- > lib/librte_mempool/rte_mempool.h | 63 +++++++++++++++++++++++++--------------- > 1 file changed, 39 insertions(+), 24 deletions(-) > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > index a8054e1..896946c 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -948,34 +948,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, > unsigned lcore_id = rte_lcore_id(); > uint32_t cache_size = mp->cache_size; > > - /* cache is not enabled or single consumer */ > + cache = &mp->local_cache[lcore_id]; > + /* cache is not enabled or single consumer or not enough */ > if (unlikely(cache_size == 0 || is_mc == 0 || > - n >= cache_size || lcore_id >= RTE_MAX_LCORE)) > + cache->len < n || lcore_id >= RTE_MAX_LCORE)) > goto ring_dequeue; > > - cache = &mp->local_cache[lcore_id]; > cache_objs = cache->objs; > > - /* Can this be satisfied from the cache? */ > - if (cache->len < n) { > - /* No. Backfill the cache first, and then fill from it */ > - uint32_t req = n + (cache_size - cache->len); > - > - /* How many do we require i.e. number to fill the cache + the request */ > - ret = rte_ring_mc_dequeue_bulk(mp->ring, &cache->objs[cache->len], req); > - if (unlikely(ret < 0)) { > - /* > - * In the offchance that we are buffer constrained, > - * where we are not able to allocate cache + n, go to > - * the ring directly. If that fails, we are truly out of > - * buffers. > - */ > - goto ring_dequeue; > - } > - > - cache->len += req; > - } > - > /* Now fill in the response ... */ > for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++) > *obj_table = cache_objs[len]; > @@ -984,7 +964,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, > > __MEMPOOL_STAT_ADD(mp, get_success, n); > > - return 0; > + ret = 0; > + goto cache_refill; > > ring_dequeue: > #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > @@ -995,11 +976,45 @@ ring_dequeue: > else > ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n); > > +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > + if (ret < 0 && is_mc == 1 && cache->len > 0) { if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0)) ? > + uint32_t req = n - cache->len; > + > + ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req); > + if (ret == 0) { > + cache_objs = cache->objs; > + obj_table += req; > + for (index = 0; index < cache->len; > + ++index, ++obj_table) > + *obj_table = cache_objs[index]; > + cache->len = 0; > + } > + } > +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > + > if (ret < 0) > __MEMPOOL_STAT_ADD(mp, get_fail, n); > else > __MEMPOOL_STAT_ADD(mp, get_success, n); > > +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > +cache_refill: > + /* If previous dequeue was OK and we have less than n, start refill */ > + if (ret == 0 && cache_size > 0 && cache->len < n) { Not sure it's likely or unlikely there. I'll tend to say unlikely as the cache size is probably big compared to n most of the time. I don't know if it would have a real performance impact thought, but I think it won't hurt. Regards, Olivier > + uint32_t req = cache_size - cache->len; > + > + cache_objs = cache->objs; > + ret = rte_ring_mc_dequeue_bulk(mp->ring, > + &cache->objs[cache->len], > + req); > + if (likely(ret == 0)) > + cache->len += req; > + else > + /* Don't spoil the return value */ > + ret = 0; > + } > +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > + > return ret; > } > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH] mempool: improbe cache search 2015-06-30 11:58 ` Olivier MATZ @ 2015-06-30 13:59 ` Zoltan Kiss 0 siblings, 0 replies; 10+ messages in thread From: Zoltan Kiss @ 2015-06-30 13:59 UTC (permalink / raw) To: Olivier MATZ, dev On 30/06/15 12:58, Olivier MATZ wrote: > Hi Zoltan, > > On 06/25/2015 08:48 PM, Zoltan Kiss wrote: >> The current way has a few problems: >> >> - if cache->len < n, we copy our elements into the cache first, then >> into obj_table, that's unnecessary >> - if n >= cache_size (or the backfill fails), and we can't fulfil the >> request from the ring alone, we don't try to combine with the cache >> - if refill fails, we don't return anything, even if the ring has enough >> for our request >> >> This patch rewrites it severely: >> - at the first part of the function we only try the cache if >> cache->len < n >> - otherwise take our elements straight from the ring >> - if that fails but we have something in the cache, try to combine them >> - the refill happens at the end, and its failure doesn't modify our >> return >> value > > Indeed, it looks easier to read that way. I checked the performance with > "mempool_perf_autotest" of app/test and it show that there is no > regression (it is even slightly better in some test cases). > > There is a small typo in the title: s/improbe/improve Yes, I'll fix that. > Please see also a comment below. > >> >> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> >> --- >> lib/librte_mempool/rte_mempool.h | 63 >> +++++++++++++++++++++++++--------------- >> 1 file changed, 39 insertions(+), 24 deletions(-) >> >> diff --git a/lib/librte_mempool/rte_mempool.h >> b/lib/librte_mempool/rte_mempool.h >> index a8054e1..896946c 100644 >> --- a/lib/librte_mempool/rte_mempool.h >> +++ b/lib/librte_mempool/rte_mempool.h >> @@ -948,34 +948,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void >> **obj_table, >> unsigned lcore_id = rte_lcore_id(); >> uint32_t cache_size = mp->cache_size; >> >> - /* cache is not enabled or single consumer */ >> + cache = &mp->local_cache[lcore_id]; >> + /* cache is not enabled or single consumer or not enough */ >> if (unlikely(cache_size == 0 || is_mc == 0 || >> - n >= cache_size || lcore_id >= RTE_MAX_LCORE)) >> + cache->len < n || lcore_id >= RTE_MAX_LCORE)) >> goto ring_dequeue; >> >> - cache = &mp->local_cache[lcore_id]; >> cache_objs = cache->objs; >> >> - /* Can this be satisfied from the cache? */ >> - if (cache->len < n) { >> - /* No. Backfill the cache first, and then fill from it */ >> - uint32_t req = n + (cache_size - cache->len); >> - >> - /* How many do we require i.e. number to fill the cache + the >> request */ >> - ret = rte_ring_mc_dequeue_bulk(mp->ring, >> &cache->objs[cache->len], req); >> - if (unlikely(ret < 0)) { >> - /* >> - * In the offchance that we are buffer constrained, >> - * where we are not able to allocate cache + n, go to >> - * the ring directly. If that fails, we are truly out of >> - * buffers. >> - */ >> - goto ring_dequeue; >> - } >> - >> - cache->len += req; >> - } >> - >> /* Now fill in the response ... */ >> for (index = 0, len = cache->len - 1; index < n; ++index, len--, >> obj_table++) >> *obj_table = cache_objs[len]; >> @@ -984,7 +964,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void >> **obj_table, >> >> __MEMPOOL_STAT_ADD(mp, get_success, n); >> >> - return 0; >> + ret = 0; >> + goto cache_refill; >> >> ring_dequeue: >> #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ >> @@ -995,11 +976,45 @@ ring_dequeue: >> else >> ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n); >> >> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 >> + if (ret < 0 && is_mc == 1 && cache->len > 0) { > > if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0)) ? Ok > > >> + uint32_t req = n - cache->len; >> + >> + ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req); >> + if (ret == 0) { >> + cache_objs = cache->objs; >> + obj_table += req; >> + for (index = 0; index < cache->len; >> + ++index, ++obj_table) >> + *obj_table = cache_objs[index]; >> + cache->len = 0; >> + } >> + } >> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ >> + >> if (ret < 0) >> __MEMPOOL_STAT_ADD(mp, get_fail, n); >> else >> __MEMPOOL_STAT_ADD(mp, get_success, n); >> >> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 >> +cache_refill: >> + /* If previous dequeue was OK and we have less than n, start >> refill */ >> + if (ret == 0 && cache_size > 0 && cache->len < n) { > > Not sure it's likely or unlikely there. I'll tend to say unlikely > as the cache size is probably big compared to n most of the time. > > I don't know if it would have a real performance impact thought, but > I think it won't hurt. I think it's not obvious here which one should happen more often on the hot path. I think it's better to follow the rule of thumb: if you are not confident about the likelihood, just don't use (un)likely, let the branch predictor decide runtime. > > > Regards, > Olivier > > >> + uint32_t req = cache_size - cache->len; >> + >> + cache_objs = cache->objs; >> + ret = rte_ring_mc_dequeue_bulk(mp->ring, >> + &cache->objs[cache->len], >> + req); >> + if (likely(ret == 0)) >> + cache->len += req; >> + else >> + /* Don't spoil the return value */ >> + ret = 0; >> + } >> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ >> + >> return ret; >> } >> >> > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [dpdk-dev] [PATCH v2] mempool: improve cache search 2015-06-25 18:48 [dpdk-dev] [PATCH] mempool: improbe cache search Zoltan Kiss 2015-06-30 11:58 ` Olivier MATZ @ 2015-07-01 9:03 ` Zoltan Kiss 2015-07-02 17:07 ` Ananyev, Konstantin 2015-07-03 13:32 ` Olivier MATZ 1 sibling, 2 replies; 10+ messages in thread From: Zoltan Kiss @ 2015-07-01 9:03 UTC (permalink / raw) To: dev The current way has a few problems: - if cache->len < n, we copy our elements into the cache first, then into obj_table, that's unnecessary - if n >= cache_size (or the backfill fails), and we can't fulfil the request from the ring alone, we don't try to combine with the cache - if refill fails, we don't return anything, even if the ring has enough for our request This patch rewrites it severely: - at the first part of the function we only try the cache if cache->len < n - otherwise take our elements straight from the ring - if that fails but we have something in the cache, try to combine them - the refill happens at the end, and its failure doesn't modify our return value Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> --- v2: - fix subject - add unlikely for branch where request is fulfilled both from cache and ring lib/librte_mempool/rte_mempool.h | 63 +++++++++++++++++++++++++--------------- 1 file changed, 39 insertions(+), 24 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 6d4ce9a..1e96f03 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -947,34 +947,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, unsigned lcore_id = rte_lcore_id(); uint32_t cache_size = mp->cache_size; - /* cache is not enabled or single consumer */ + cache = &mp->local_cache[lcore_id]; + /* cache is not enabled or single consumer or not enough */ if (unlikely(cache_size == 0 || is_mc == 0 || - n >= cache_size || lcore_id >= RTE_MAX_LCORE)) + cache->len < n || lcore_id >= RTE_MAX_LCORE)) goto ring_dequeue; - cache = &mp->local_cache[lcore_id]; cache_objs = cache->objs; - /* Can this be satisfied from the cache? */ - if (cache->len < n) { - /* No. Backfill the cache first, and then fill from it */ - uint32_t req = n + (cache_size - cache->len); - - /* How many do we require i.e. number to fill the cache + the request */ - ret = rte_ring_mc_dequeue_bulk(mp->ring, &cache->objs[cache->len], req); - if (unlikely(ret < 0)) { - /* - * In the offchance that we are buffer constrained, - * where we are not able to allocate cache + n, go to - * the ring directly. If that fails, we are truly out of - * buffers. - */ - goto ring_dequeue; - } - - cache->len += req; - } - /* Now fill in the response ... */ for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++) *obj_table = cache_objs[len]; @@ -983,7 +963,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, __MEMPOOL_STAT_ADD(mp, get_success, n); - return 0; + ret = 0; + goto cache_refill; ring_dequeue: #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ @@ -994,11 +975,45 @@ ring_dequeue: else ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n); +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 + if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0)) { + uint32_t req = n - cache->len; + + ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req); + if (ret == 0) { + cache_objs = cache->objs; + obj_table += req; + for (index = 0; index < cache->len; + ++index, ++obj_table) + *obj_table = cache_objs[index]; + cache->len = 0; + } + } +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ + if (ret < 0) __MEMPOOL_STAT_ADD(mp, get_fail, n); else __MEMPOOL_STAT_ADD(mp, get_success, n); +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 +cache_refill: + /* If previous dequeue was OK and we have less than n, start refill */ + if (ret == 0 && cache_size > 0 && cache->len < n) { + uint32_t req = cache_size - cache->len; + + cache_objs = cache->objs; + ret = rte_ring_mc_dequeue_bulk(mp->ring, + &cache->objs[cache->len], + req); + if (likely(ret == 0)) + cache->len += req; + else + /* Don't spoil the return value */ + ret = 0; + } +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ + return ret; } -- 1.9.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: improve cache search 2015-07-01 9:03 ` [dpdk-dev] [PATCH v2] mempool: improve " Zoltan Kiss @ 2015-07-02 17:07 ` Ananyev, Konstantin 2015-07-07 17:17 ` Zoltan Kiss 2015-07-03 13:32 ` Olivier MATZ 1 sibling, 1 reply; 10+ messages in thread From: Ananyev, Konstantin @ 2015-07-02 17:07 UTC (permalink / raw) To: Zoltan Kiss, dev > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss > Sent: Wednesday, July 01, 2015 10:04 AM > To: dev@dpdk.org > Subject: [dpdk-dev] [PATCH v2] mempool: improve cache search > > The current way has a few problems: > > - if cache->len < n, we copy our elements into the cache first, then > into obj_table, that's unnecessary > - if n >= cache_size (or the backfill fails), and we can't fulfil the > request from the ring alone, we don't try to combine with the cache > - if refill fails, we don't return anything, even if the ring has enough > for our request > > This patch rewrites it severely: > - at the first part of the function we only try the cache if cache->len < n > - otherwise take our elements straight from the ring > - if that fails but we have something in the cache, try to combine them > - the refill happens at the end, and its failure doesn't modify our return > value > > Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> > --- > v2: > - fix subject > - add unlikely for branch where request is fulfilled both from cache and ring > > lib/librte_mempool/rte_mempool.h | 63 +++++++++++++++++++++++++--------------- > 1 file changed, 39 insertions(+), 24 deletions(-) > > diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > index 6d4ce9a..1e96f03 100644 > --- a/lib/librte_mempool/rte_mempool.h > +++ b/lib/librte_mempool/rte_mempool.h > @@ -947,34 +947,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, > unsigned lcore_id = rte_lcore_id(); > uint32_t cache_size = mp->cache_size; > > - /* cache is not enabled or single consumer */ > + cache = &mp->local_cache[lcore_id]; > + /* cache is not enabled or single consumer or not enough */ > if (unlikely(cache_size == 0 || is_mc == 0 || > - n >= cache_size || lcore_id >= RTE_MAX_LCORE)) > + cache->len < n || lcore_id >= RTE_MAX_LCORE)) > goto ring_dequeue; > > - cache = &mp->local_cache[lcore_id]; > cache_objs = cache->objs; > > - /* Can this be satisfied from the cache? */ > - if (cache->len < n) { > - /* No. Backfill the cache first, and then fill from it */ > - uint32_t req = n + (cache_size - cache->len); > - > - /* How many do we require i.e. number to fill the cache + the request */ > - ret = rte_ring_mc_dequeue_bulk(mp->ring, &cache->objs[cache->len], req); > - if (unlikely(ret < 0)) { > - /* > - * In the offchance that we are buffer constrained, > - * where we are not able to allocate cache + n, go to > - * the ring directly. If that fails, we are truly out of > - * buffers. > - */ > - goto ring_dequeue; > - } > - > - cache->len += req; > - } > - > /* Now fill in the response ... */ > for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++) > *obj_table = cache_objs[len]; > @@ -983,7 +963,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, > > __MEMPOOL_STAT_ADD(mp, get_success, n); > > - return 0; > + ret = 0; > + goto cache_refill; > > ring_dequeue: > #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > @@ -994,11 +975,45 @@ ring_dequeue: > else > ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n); > > +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > + if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0)) { > + uint32_t req = n - cache->len; > + > + ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req); > + if (ret == 0) { > + cache_objs = cache->objs; > + obj_table += req; > + for (index = 0; index < cache->len; > + ++index, ++obj_table) > + *obj_table = cache_objs[index]; > + cache->len = 0; > + } > + } > +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > + > if (ret < 0) > __MEMPOOL_STAT_ADD(mp, get_fail, n); > else > __MEMPOOL_STAT_ADD(mp, get_success, n); > > +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > +cache_refill: Ok, so if I get things right: if the lcore runs out of entries in cache, then on next __mempool_get_bulk() it has to do ring_dequeue() twice: 1. to satisfy user request 2. to refill the cache. Right? If that so, then I think the current approach: ring_dequeue() once to refill the cache, then copy entries from the cache to the user is a cheaper(faster) one for many cases. Especially when same pool is shared between multiple threads. For example when thread is doing RX only (no TX). > + /* If previous dequeue was OK and we have less than n, start refill */ > + if (ret == 0 && cache_size > 0 && cache->len < n) { > + uint32_t req = cache_size - cache->len; It could be that n > cache_size. For that case, there probably no point to refill the cache, as you took entrires from the ring and cache was intact. Konstantin > + > + cache_objs = cache->objs; > + ret = rte_ring_mc_dequeue_bulk(mp->ring, > + &cache->objs[cache->len], > + req); > + if (likely(ret == 0)) > + cache->len += req; > + else > + /* Don't spoil the return value */ > + ret = 0; > + } > +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > + > return ret; > } > > -- > 1.9.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: improve cache search 2015-07-02 17:07 ` Ananyev, Konstantin @ 2015-07-07 17:17 ` Zoltan Kiss 2015-07-08 9:27 ` Bruce Richardson 2015-07-15 8:56 ` Olivier MATZ 0 siblings, 2 replies; 10+ messages in thread From: Zoltan Kiss @ 2015-07-07 17:17 UTC (permalink / raw) To: Ananyev, Konstantin, dev On 02/07/15 18:07, Ananyev, Konstantin wrote: > > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss >> Sent: Wednesday, July 01, 2015 10:04 AM >> To: dev@dpdk.org >> Subject: [dpdk-dev] [PATCH v2] mempool: improve cache search >> >> The current way has a few problems: >> >> - if cache->len < n, we copy our elements into the cache first, then >> into obj_table, that's unnecessary >> - if n >= cache_size (or the backfill fails), and we can't fulfil the >> request from the ring alone, we don't try to combine with the cache >> - if refill fails, we don't return anything, even if the ring has enough >> for our request >> >> This patch rewrites it severely: >> - at the first part of the function we only try the cache if cache->len < n >> - otherwise take our elements straight from the ring >> - if that fails but we have something in the cache, try to combine them >> - the refill happens at the end, and its failure doesn't modify our return >> value >> >> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> >> --- >> v2: >> - fix subject >> - add unlikely for branch where request is fulfilled both from cache and ring >> >> lib/librte_mempool/rte_mempool.h | 63 +++++++++++++++++++++++++--------------- >> 1 file changed, 39 insertions(+), 24 deletions(-) >> >> diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h >> index 6d4ce9a..1e96f03 100644 >> --- a/lib/librte_mempool/rte_mempool.h >> +++ b/lib/librte_mempool/rte_mempool.h >> @@ -947,34 +947,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, >> unsigned lcore_id = rte_lcore_id(); >> uint32_t cache_size = mp->cache_size; >> >> - /* cache is not enabled or single consumer */ >> + cache = &mp->local_cache[lcore_id]; >> + /* cache is not enabled or single consumer or not enough */ >> if (unlikely(cache_size == 0 || is_mc == 0 || >> - n >= cache_size || lcore_id >= RTE_MAX_LCORE)) >> + cache->len < n || lcore_id >= RTE_MAX_LCORE)) >> goto ring_dequeue; >> >> - cache = &mp->local_cache[lcore_id]; >> cache_objs = cache->objs; >> >> - /* Can this be satisfied from the cache? */ >> - if (cache->len < n) { >> - /* No. Backfill the cache first, and then fill from it */ >> - uint32_t req = n + (cache_size - cache->len); >> - >> - /* How many do we require i.e. number to fill the cache + the request */ >> - ret = rte_ring_mc_dequeue_bulk(mp->ring, &cache->objs[cache->len], req); >> - if (unlikely(ret < 0)) { >> - /* >> - * In the offchance that we are buffer constrained, >> - * where we are not able to allocate cache + n, go to >> - * the ring directly. If that fails, we are truly out of >> - * buffers. >> - */ >> - goto ring_dequeue; >> - } >> - >> - cache->len += req; >> - } >> - >> /* Now fill in the response ... */ >> for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++) >> *obj_table = cache_objs[len]; >> @@ -983,7 +963,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, >> >> __MEMPOOL_STAT_ADD(mp, get_success, n); >> >> - return 0; >> + ret = 0; >> + goto cache_refill; >> >> ring_dequeue: >> #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ >> @@ -994,11 +975,45 @@ ring_dequeue: >> else >> ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n); >> >> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 >> + if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0)) { >> + uint32_t req = n - cache->len; >> + >> + ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req); >> + if (ret == 0) { >> + cache_objs = cache->objs; >> + obj_table += req; >> + for (index = 0; index < cache->len; >> + ++index, ++obj_table) >> + *obj_table = cache_objs[index]; >> + cache->len = 0; >> + } >> + } >> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ >> + >> if (ret < 0) >> __MEMPOOL_STAT_ADD(mp, get_fail, n); >> else >> __MEMPOOL_STAT_ADD(mp, get_success, n); >> >> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 >> +cache_refill: > > Ok, so if I get things right: if the lcore runs out of entries in cache, > then on next __mempool_get_bulk() it has to do ring_dequeue() twice: > 1. to satisfy user request > 2. to refill the cache. > Right? Yes. > If that so, then I think the current approach: > ring_dequeue() once to refill the cache, then copy entries from the cache to the user > is a cheaper(faster) one for many cases. But then you can't return anything if the refill fails, even if there would be enough in the ring (or ring+cache combined). Unless you retry with just n. __rte_ring_mc_do_dequeue is inlined, as far as I see the overhead of calling twice is: - check the number of entries in the ring, and atomic cmpset of cons.head again. This can loop if an other dequeue preceded us while doing that subtraction, but as that's a very short interval, I think it's not very likely - an extra rte_compiler_barrier() - wait for preceding dequeues to finish, and set cons.tail to the new value. I think this can happen often when 'n' has a big variation, so the previous dequeue can be easily much bigger - statistics update I guess if there is no contention on the ring the extra memcpy outweighs these easily. And my gut feeling says that contention around the two while loop should not be high unless, but I don't have hard facts. An another argument for doing two dequeue because we can do burst dequeue for the cache refill, which is better than only accepting the full amount. How about the following? If the cache can't satisfy the request, we do a dequeue from the ring to the cache for n + cache_size, but with rte_ring_mc_dequeue_burst. So it takes as many as it can, but doesn't fail if it can't take the whole. Then we copy from cache to obj_table, if there is enough. It makes sure we utilize as much as possible, with one ring dequeue. > Especially when same pool is shared between multiple threads. > For example when thread is doing RX only (no TX). > > >> + /* If previous dequeue was OK and we have less than n, start refill */ >> + if (ret == 0 && cache_size > 0 && cache->len < n) { >> + uint32_t req = cache_size - cache->len; > > > It could be that n > cache_size. > For that case, there probably no point to refill the cache, as you took entrires from the ring > and cache was intact. Yes, it makes sense to add. > > Konstantin > >> + >> + cache_objs = cache->objs; >> + ret = rte_ring_mc_dequeue_bulk(mp->ring, >> + &cache->objs[cache->len], >> + req); >> + if (likely(ret == 0)) >> + cache->len += req; >> + else >> + /* Don't spoil the return value */ >> + ret = 0; >> + } >> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ >> + >> return ret; >> } >> >> -- >> 1.9.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: improve cache search 2015-07-07 17:17 ` Zoltan Kiss @ 2015-07-08 9:27 ` Bruce Richardson 2015-07-15 8:56 ` Olivier MATZ 1 sibling, 0 replies; 10+ messages in thread From: Bruce Richardson @ 2015-07-08 9:27 UTC (permalink / raw) To: Zoltan Kiss; +Cc: dev On Tue, Jul 07, 2015 at 06:17:05PM +0100, Zoltan Kiss wrote: > > > On 02/07/15 18:07, Ananyev, Konstantin wrote: > > > > > >>-----Original Message----- > >>From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss > >>Sent: Wednesday, July 01, 2015 10:04 AM > >>To: dev@dpdk.org > >>Subject: [dpdk-dev] [PATCH v2] mempool: improve cache search > >> > >>The current way has a few problems: > >> > >>- if cache->len < n, we copy our elements into the cache first, then > >> into obj_table, that's unnecessary > >>- if n >= cache_size (or the backfill fails), and we can't fulfil the > >> request from the ring alone, we don't try to combine with the cache > >>- if refill fails, we don't return anything, even if the ring has enough > >> for our request > >> > >>This patch rewrites it severely: > >>- at the first part of the function we only try the cache if cache->len < n > >>- otherwise take our elements straight from the ring > >>- if that fails but we have something in the cache, try to combine them > >>- the refill happens at the end, and its failure doesn't modify our return > >> value > >> > >>Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> > >>--- > >>v2: > >>- fix subject > >>- add unlikely for branch where request is fulfilled both from cache and ring > >> > >> lib/librte_mempool/rte_mempool.h | 63 +++++++++++++++++++++++++--------------- > >> 1 file changed, 39 insertions(+), 24 deletions(-) > >> > >>diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h > >>index 6d4ce9a..1e96f03 100644 > >>--- a/lib/librte_mempool/rte_mempool.h > >>+++ b/lib/librte_mempool/rte_mempool.h > >>@@ -947,34 +947,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, > >> unsigned lcore_id = rte_lcore_id(); > >> uint32_t cache_size = mp->cache_size; > >> > >>- /* cache is not enabled or single consumer */ > >>+ cache = &mp->local_cache[lcore_id]; > >>+ /* cache is not enabled or single consumer or not enough */ > >> if (unlikely(cache_size == 0 || is_mc == 0 || > >>- n >= cache_size || lcore_id >= RTE_MAX_LCORE)) > >>+ cache->len < n || lcore_id >= RTE_MAX_LCORE)) > >> goto ring_dequeue; > >> > >>- cache = &mp->local_cache[lcore_id]; > >> cache_objs = cache->objs; > >> > >>- /* Can this be satisfied from the cache? */ > >>- if (cache->len < n) { > >>- /* No. Backfill the cache first, and then fill from it */ > >>- uint32_t req = n + (cache_size - cache->len); > >>- > >>- /* How many do we require i.e. number to fill the cache + the request */ > >>- ret = rte_ring_mc_dequeue_bulk(mp->ring, &cache->objs[cache->len], req); > >>- if (unlikely(ret < 0)) { > >>- /* > >>- * In the offchance that we are buffer constrained, > >>- * where we are not able to allocate cache + n, go to > >>- * the ring directly. If that fails, we are truly out of > >>- * buffers. > >>- */ > >>- goto ring_dequeue; > >>- } > >>- > >>- cache->len += req; > >>- } > >>- > >> /* Now fill in the response ... */ > >> for (index = 0, len = cache->len - 1; index < n; ++index, len--, obj_table++) > >> *obj_table = cache_objs[len]; > >>@@ -983,7 +963,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void **obj_table, > >> > >> __MEMPOOL_STAT_ADD(mp, get_success, n); > >> > >>- return 0; > >>+ ret = 0; > >>+ goto cache_refill; > >> > >> ring_dequeue: > >> #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > >>@@ -994,11 +975,45 @@ ring_dequeue: > >> else > >> ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n); > >> > >>+#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > >>+ if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0)) { > >>+ uint32_t req = n - cache->len; > >>+ > >>+ ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req); > >>+ if (ret == 0) { > >>+ cache_objs = cache->objs; > >>+ obj_table += req; > >>+ for (index = 0; index < cache->len; > >>+ ++index, ++obj_table) > >>+ *obj_table = cache_objs[index]; > >>+ cache->len = 0; > >>+ } > >>+ } > >>+#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ > >>+ > >> if (ret < 0) > >> __MEMPOOL_STAT_ADD(mp, get_fail, n); > >> else > >> __MEMPOOL_STAT_ADD(mp, get_success, n); > >> > >>+#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 > >>+cache_refill: > > > >Ok, so if I get things right: if the lcore runs out of entries in cache, > >then on next __mempool_get_bulk() it has to do ring_dequeue() twice: > >1. to satisfy user request > >2. to refill the cache. > >Right? > Yes. > > >If that so, then I think the current approach: > >ring_dequeue() once to refill the cache, then copy entries from the cache to the user > >is a cheaper(faster) one for many cases. > But then you can't return anything if the refill fails, even if there would > be enough in the ring (or ring+cache combined). Unless you retry with just > n. > __rte_ring_mc_do_dequeue is inlined, as far as I see the overhead of calling > twice is: > - check the number of entries in the ring, and atomic cmpset of cons.head > again. This can loop if an other dequeue preceded us while doing that > subtraction, but as that's a very short interval, I think it's not very > likely > - an extra rte_compiler_barrier() > - wait for preceding dequeues to finish, and set cons.tail to the new value. > I think this can happen often when 'n' has a big variation, so the previous > dequeue can be easily much bigger > - statistics update > > I guess if there is no contention on the ring the extra memcpy outweighs > these easily. And my gut feeling says that contention around the two while > loop should not be high unless, but I don't have hard facts. > An another argument for doing two dequeue because we can do burst dequeue > for the cache refill, which is better than only accepting the full amount. > > How about the following? > If the cache can't satisfy the request, we do a dequeue from the ring to the > cache for n + cache_size, but with rte_ring_mc_dequeue_burst. So it takes as > many as it can, but doesn't fail if it can't take the whole. > Then we copy from cache to obj_table, if there is enough. > It makes sure we utilize as much as possible, with one ring dequeue. > That sounds like an approach that may work better. The cost of doing the cmpset in the dequeue is likely to be the most expensive part of the whole operation so we should try and minimise dequeues if at all possible. /Bruce ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: improve cache search 2015-07-07 17:17 ` Zoltan Kiss 2015-07-08 9:27 ` Bruce Richardson @ 2015-07-15 8:56 ` Olivier MATZ 1 sibling, 0 replies; 10+ messages in thread From: Olivier MATZ @ 2015-07-15 8:56 UTC (permalink / raw) To: Zoltan Kiss, Ananyev, Konstantin, dev Hi, On 07/07/2015 07:17 PM, Zoltan Kiss wrote: > > > On 02/07/15 18:07, Ananyev, Konstantin wrote: >> >> >>> -----Original Message----- >>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Zoltan Kiss >>> Sent: Wednesday, July 01, 2015 10:04 AM >>> To: dev@dpdk.org >>> Subject: [dpdk-dev] [PATCH v2] mempool: improve cache search >>> >>> The current way has a few problems: >>> >>> - if cache->len < n, we copy our elements into the cache first, then >>> into obj_table, that's unnecessary >>> - if n >= cache_size (or the backfill fails), and we can't fulfil the >>> request from the ring alone, we don't try to combine with the cache >>> - if refill fails, we don't return anything, even if the ring has enough >>> for our request >>> >>> This patch rewrites it severely: >>> - at the first part of the function we only try the cache if >>> cache->len < n >>> - otherwise take our elements straight from the ring >>> - if that fails but we have something in the cache, try to combine them >>> - the refill happens at the end, and its failure doesn't modify our >>> return >>> value >>> >>> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> >>> --- >>> v2: >>> - fix subject >>> - add unlikely for branch where request is fulfilled both from cache >>> and ring >>> >>> lib/librte_mempool/rte_mempool.h | 63 >>> +++++++++++++++++++++++++--------------- >>> 1 file changed, 39 insertions(+), 24 deletions(-) >>> >>> diff --git a/lib/librte_mempool/rte_mempool.h >>> b/lib/librte_mempool/rte_mempool.h >>> index 6d4ce9a..1e96f03 100644 >>> --- a/lib/librte_mempool/rte_mempool.h >>> +++ b/lib/librte_mempool/rte_mempool.h >>> @@ -947,34 +947,14 @@ __mempool_get_bulk(struct rte_mempool *mp, void >>> **obj_table, >>> unsigned lcore_id = rte_lcore_id(); >>> uint32_t cache_size = mp->cache_size; >>> >>> - /* cache is not enabled or single consumer */ >>> + cache = &mp->local_cache[lcore_id]; >>> + /* cache is not enabled or single consumer or not enough */ >>> if (unlikely(cache_size == 0 || is_mc == 0 || >>> - n >= cache_size || lcore_id >= RTE_MAX_LCORE)) >>> + cache->len < n || lcore_id >= RTE_MAX_LCORE)) >>> goto ring_dequeue; >>> >>> - cache = &mp->local_cache[lcore_id]; >>> cache_objs = cache->objs; >>> >>> - /* Can this be satisfied from the cache? */ >>> - if (cache->len < n) { >>> - /* No. Backfill the cache first, and then fill from it */ >>> - uint32_t req = n + (cache_size - cache->len); >>> - >>> - /* How many do we require i.e. number to fill the cache + >>> the request */ >>> - ret = rte_ring_mc_dequeue_bulk(mp->ring, >>> &cache->objs[cache->len], req); >>> - if (unlikely(ret < 0)) { >>> - /* >>> - * In the offchance that we are buffer constrained, >>> - * where we are not able to allocate cache + n, go to >>> - * the ring directly. If that fails, we are truly out of >>> - * buffers. >>> - */ >>> - goto ring_dequeue; >>> - } >>> - >>> - cache->len += req; >>> - } >>> - >>> /* Now fill in the response ... */ >>> for (index = 0, len = cache->len - 1; index < n; ++index, >>> len--, obj_table++) >>> *obj_table = cache_objs[len]; >>> @@ -983,7 +963,8 @@ __mempool_get_bulk(struct rte_mempool *mp, void >>> **obj_table, >>> >>> __MEMPOOL_STAT_ADD(mp, get_success, n); >>> >>> - return 0; >>> + ret = 0; >>> + goto cache_refill; >>> >>> ring_dequeue: >>> #endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ >>> @@ -994,11 +975,45 @@ ring_dequeue: >>> else >>> ret = rte_ring_sc_dequeue_bulk(mp->ring, obj_table, n); >>> >>> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 >>> + if (unlikely(ret < 0 && is_mc == 1 && cache->len > 0)) { >>> + uint32_t req = n - cache->len; >>> + >>> + ret = rte_ring_mc_dequeue_bulk(mp->ring, obj_table, req); >>> + if (ret == 0) { >>> + cache_objs = cache->objs; >>> + obj_table += req; >>> + for (index = 0; index < cache->len; >>> + ++index, ++obj_table) >>> + *obj_table = cache_objs[index]; >>> + cache->len = 0; >>> + } >>> + } >>> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ >>> + >>> if (ret < 0) >>> __MEMPOOL_STAT_ADD(mp, get_fail, n); >>> else >>> __MEMPOOL_STAT_ADD(mp, get_success, n); >>> >>> +#if RTE_MEMPOOL_CACHE_MAX_SIZE > 0 >>> +cache_refill: >> >> Ok, so if I get things right: if the lcore runs out of entries in cache, >> then on next __mempool_get_bulk() it has to do ring_dequeue() twice: >> 1. to satisfy user request >> 2. to refill the cache. >> Right? > Yes. > >> If that so, then I think the current approach: >> ring_dequeue() once to refill the cache, then copy entries from the >> cache to the user >> is a cheaper(faster) one for many cases. > But then you can't return anything if the refill fails, even if there > would be enough in the ring (or ring+cache combined). Unless you retry > with just n. > __rte_ring_mc_do_dequeue is inlined, as far as I see the overhead of > calling twice is: > - check the number of entries in the ring, and atomic cmpset of > cons.head again. This can loop if an other dequeue preceded us while > doing that subtraction, but as that's a very short interval, I think > it's not very likely > - an extra rte_compiler_barrier() > - wait for preceding dequeues to finish, and set cons.tail to the new > value. I think this can happen often when 'n' has a big variation, so > the previous dequeue can be easily much bigger > - statistics update > > I guess if there is no contention on the ring the extra memcpy outweighs > these easily. And my gut feeling says that contention around the two > while loop should not be high unless, but I don't have hard facts. > An another argument for doing two dequeue because we can do burst > dequeue for the cache refill, which is better than only accepting the > full amount. > > How about the following? > If the cache can't satisfy the request, we do a dequeue from the ring to > the cache for n + cache_size, but with rte_ring_mc_dequeue_burst. So it > takes as many as it can, but doesn't fail if it can't take the whole. > Then we copy from cache to obj_table, if there is enough. > It makes sure we utilize as much as possible, with one ring dequeue. Will it be possible to dequeue "n + cache_size"? I think it would require to allocate some space to store the object pointers, right? I don't feel it's a good idea to use a dynamic local table (or alloca()) that depends on n. > > > > >> Especially when same pool is shared between multiple threads. >> For example when thread is doing RX only (no TX). >> >> >>> + /* If previous dequeue was OK and we have less than n, start >>> refill */ >>> + if (ret == 0 && cache_size > 0 && cache->len < n) { >>> + uint32_t req = cache_size - cache->len; >> >> >> It could be that n > cache_size. >> For that case, there probably no point to refill the cache, as you >> took entrires from the ring >> and cache was intact. > > Yes, it makes sense to add. >> >> Konstantin >> >>> + >>> + cache_objs = cache->objs; >>> + ret = rte_ring_mc_dequeue_bulk(mp->ring, >>> + &cache->objs[cache->len], >>> + req); >>> + if (likely(ret == 0)) >>> + cache->len += req; >>> + else >>> + /* Don't spoil the return value */ >>> + ret = 0; >>> + } >>> +#endif /* RTE_MEMPOOL_CACHE_MAX_SIZE > 0 */ >>> + >>> return ret; >>> } >>> >>> -- >>> 1.9.1 >> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: improve cache search 2015-07-01 9:03 ` [dpdk-dev] [PATCH v2] mempool: improve " Zoltan Kiss 2015-07-02 17:07 ` Ananyev, Konstantin @ 2015-07-03 13:32 ` Olivier MATZ 2015-07-03 13:44 ` Olivier MATZ 1 sibling, 1 reply; 10+ messages in thread From: Olivier MATZ @ 2015-07-03 13:32 UTC (permalink / raw) To: Zoltan Kiss, dev On 07/01/2015 11:03 AM, Zoltan Kiss wrote: > The current way has a few problems: > > - if cache->len < n, we copy our elements into the cache first, then > into obj_table, that's unnecessary > - if n >= cache_size (or the backfill fails), and we can't fulfil the > request from the ring alone, we don't try to combine with the cache > - if refill fails, we don't return anything, even if the ring has enough > for our request > > This patch rewrites it severely: > - at the first part of the function we only try the cache if cache->len < n > - otherwise take our elements straight from the ring > - if that fails but we have something in the cache, try to combine them > - the refill happens at the end, and its failure doesn't modify our return > value > > Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> Acked-by: Olivier Matz <olivier.matz@6wind.com> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [dpdk-dev] [PATCH v2] mempool: improve cache search 2015-07-03 13:32 ` Olivier MATZ @ 2015-07-03 13:44 ` Olivier MATZ 0 siblings, 0 replies; 10+ messages in thread From: Olivier MATZ @ 2015-07-03 13:44 UTC (permalink / raw) To: Zoltan Kiss, dev On 07/03/2015 03:32 PM, Olivier MATZ wrote: > > > On 07/01/2015 11:03 AM, Zoltan Kiss wrote: >> The current way has a few problems: >> >> - if cache->len < n, we copy our elements into the cache first, then >> into obj_table, that's unnecessary >> - if n >= cache_size (or the backfill fails), and we can't fulfil the >> request from the ring alone, we don't try to combine with the cache >> - if refill fails, we don't return anything, even if the ring has enough >> for our request >> >> This patch rewrites it severely: >> - at the first part of the function we only try the cache if cache->len < n >> - otherwise take our elements straight from the ring >> - if that fails but we have something in the cache, try to combine them >> - the refill happens at the end, and its failure doesn't modify our return >> value >> >> Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org> > > > Acked-by: Olivier Matz <olivier.matz@6wind.com> > Please ignore, sorry, I missed Konstantin's relevant comment. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-07-15 8:56 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-06-25 18:48 [dpdk-dev] [PATCH] mempool: improbe cache search Zoltan Kiss 2015-06-30 11:58 ` Olivier MATZ 2015-06-30 13:59 ` Zoltan Kiss 2015-07-01 9:03 ` [dpdk-dev] [PATCH v2] mempool: improve " Zoltan Kiss 2015-07-02 17:07 ` Ananyev, Konstantin 2015-07-07 17:17 ` Zoltan Kiss 2015-07-08 9:27 ` Bruce Richardson 2015-07-15 8:56 ` Olivier MATZ 2015-07-03 13:32 ` Olivier MATZ 2015-07-03 13:44 ` Olivier MATZ
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).