DPDK patches and discussions
 help / color / mirror / Atom feed
From: Olivier Matz <olivier.matz@6wind.com>
To: "Morten Brørup" <mb@smartsharesystems.com>
Cc: andrew.rybchenko@oktetlabs.ru, jerinj@marvell.com,
	thomas@monjalon.net, bruce.richardson@intel.com, dev@dpdk.org
Subject: Re: [PATCH] mempool: cache align mempool cache objects
Date: Thu, 27 Oct 2022 10:34:42 +0200	[thread overview]
Message-ID: <Y1pCovW8C2TbCq42@platinum> (raw)
In-Reply-To: <20221026144436.71068-1-mb@smartsharesystems.com>

Hi Morten,

On Wed, Oct 26, 2022 at 04:44:36PM +0200, Morten Brørup wrote:
> Add __rte_cache_aligned to the objs array.
> 
> It makes no difference in the general case, but if get/put operations are
> always 32 objects, it will reduce the number of memory (or last level
> cache) accesses from five to four 64 B cache lines for every get/put
> operation.
> 
> For readability reasons, an example using 16 objects follows:
> 
> Currently, with 16 objects (128B), we access to 3
> cache lines:
> 
>       ┌────────┐
>       │len     │
> cache │********│---
> line0 │********│ ^
>       │********│ |
>       ├────────┤ | 16 objects
>       │********│ | 128B
> cache │********│ |
> line1 │********│ |
>       │********│ |
>       ├────────┤ |
>       │********│_v_
> cache │        │
> line2 │        │
>       │        │
>       └────────┘
> 
> With the alignment, it is also 3 cache lines:
> 
>       ┌────────┐
>       │len     │
> cache │        │
> line0 │        │
>       │        │
>       ├────────┤---
>       │********│ ^
> cache │********│ |
> line1 │********│ |
>       │********│ |
>       ├────────┤ | 16 objects
>       │********│ | 128B
> cache │********│ |
> line2 │********│ |
>       │********│ v
>       └────────┘---
> 
> However, accessing the objects at the bottom of the mempool cache is a
> special case, where cache line0 is also used for objects.
> 
> Consider the next burst (and any following bursts):
> 
> Current:
>       ┌────────┐
>       │len     │
> cache │        │
> line0 │        │
>       │        │
>       ├────────┤
>       │        │
> cache │        │
> line1 │        │
>       │        │
>       ├────────┤
>       │        │
> cache │********│---
> line2 │********│ ^
>       │********│ |
>       ├────────┤ | 16 objects
>       │********│ | 128B
> cache │********│ |
> line3 │********│ |
>       │********│ |
>       ├────────┤ |
>       │********│_v_
> cache │        │
> line4 │        │
>       │        │
>       └────────┘
> 4 cache lines touched, incl. line0 for len.
> 
> With the proposed alignment:
>       ┌────────┐
>       │len     │
> cache │        │
> line0 │        │
>       │        │
>       ├────────┤
>       │        │
> cache │        │
> line1 │        │
>       │        │
>       ├────────┤
>       │        │
> cache │        │
> line2 │        │
>       │        │
>       ├────────┤
>       │********│---
> cache │********│ ^
> line3 │********│ |
>       │********│ | 16 objects
>       ├────────┤ | 128B
>       │********│ |
> cache │********│ |
> line4 │********│ |
>       │********│_v_
>       └────────┘
> Only 3 cache lines touched, incl. line0 for len.

I understand your logic, but are we sure that having an application that
works with bulks of 32 means that the cache will stay aligned to 32
elements for the whole life of the application?

In an application, the alignment of the cache can change if you have
any of:
- software queues (reassembly for instance)
- packet duplication (bridge, multicast)
- locally generated packets (keepalive, control protocol)
- pipeline to other cores

Even with testpmd, which work by bulk of 32, I can see that the size
of the cache filling is not aligned to 32. Right after starting the
application, we already have this:

  internal cache infos:
    cache_size=250
    cache_count[0]=231

This is probably related to the hw rx rings size, number of queues,
number of ports.

The "250" default value for cache size in testpmd is questionable, but
with --mbcache=256, the behavior is similar.

Also, when we transmit to a NIC, the mbufs are not returned immediatly
to the pool, they may stay in the hw tx ring during some time, which is
a driver decision.

After processing traffic on cores 8 and 24 with this testpmd, I get:
    cache_count[0]=231
    cache_count[8]=123
    cache_count[24]=122

In my opinion, it is not realistic to think that the mempool cache will
remain aligned to cachelines. In these conditions, it looks better to
keep the structure packed to avoid wasting memory.

Olivier


> 
> Credits go to Olivier Matz for the nice ASCII graphics.
> 
> Signed-off-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/mempool/rte_mempool.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 1f5707f46a..3725a72951 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -86,11 +86,13 @@ struct rte_mempool_cache {
>  	uint32_t size;	      /**< Size of the cache */
>  	uint32_t flushthresh; /**< Threshold before we flush excess elements */
>  	uint32_t len;	      /**< Current cache count */
> -	/*
> +	/**
> +	 * Cache objects
> +	 *
>  	 * Cache is allocated to this size to allow it to overflow in certain
>  	 * cases to avoid needless emptying of cache.
>  	 */
> -	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2]; /**< Cache objects */
> +	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
>  } __rte_cache_aligned;
>  
>  /**
> -- 
> 2.17.1
> 

  parent reply	other threads:[~2022-10-27  8:34 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-26 15:34 [RFC] mempool: rte_mempool_do_generic_get optimizations Morten Brørup
2022-01-06 12:23 ` [PATCH] mempool: optimize incomplete cache handling Morten Brørup
2022-01-06 16:55   ` Jerin Jacob
2022-01-07  8:46     ` Morten Brørup
2022-01-10  7:26       ` Jerin Jacob
2022-01-10 10:55         ` Morten Brørup
2022-01-14 16:36 ` [PATCH] mempool: fix get objects from mempool with cache Morten Brørup
2022-01-17 17:35   ` Bruce Richardson
2022-01-18  8:25     ` Morten Brørup
2022-01-18  9:07       ` Bruce Richardson
2022-01-24 15:38   ` Olivier Matz
2022-01-24 16:11     ` Olivier Matz
2022-01-28 10:22     ` Morten Brørup
2022-01-17 11:52 ` [PATCH] mempool: optimize put objects to " Morten Brørup
2022-01-19 14:52 ` [PATCH v2] mempool: fix " Morten Brørup
2022-01-19 15:03 ` [PATCH v3] " Morten Brørup
2022-01-24 15:39   ` Olivier Matz
2022-01-28  9:37     ` Morten Brørup
2022-02-02  8:14 ` [PATCH v2] mempool: fix get objects from " Morten Brørup
2022-06-15 21:18   ` Morten Brørup
2022-09-29 10:52     ` Morten Brørup
2022-10-04 12:57   ` Andrew Rybchenko
2022-10-04 15:13     ` Morten Brørup
2022-10-04 15:58       ` Andrew Rybchenko
2022-10-04 18:09         ` Morten Brørup
2022-10-06 13:43       ` Aaron Conole
2022-10-04 16:03   ` Morten Brørup
2022-10-04 16:36   ` Morten Brørup
2022-10-04 16:39   ` Morten Brørup
2022-02-02 10:33 ` [PATCH v4] mempool: fix mempool cache flushing algorithm Morten Brørup
2022-04-07  9:04   ` Morten Brørup
2022-04-07  9:14     ` Bruce Richardson
2022-04-07  9:26       ` Morten Brørup
2022-04-07 10:32         ` Bruce Richardson
2022-04-07 10:43           ` Bruce Richardson
2022-04-07 11:36             ` Morten Brørup
2022-10-04 20:01   ` Morten Brørup
2022-10-09 11:11   ` [PATCH 1/2] mempool: check driver enqueue result in one place Andrew Rybchenko
2022-10-09 11:11     ` [PATCH 2/2] mempool: avoid usage of term ring on put Andrew Rybchenko
2022-10-09 13:08       ` Morten Brørup
2022-10-09 13:14         ` Andrew Rybchenko
2022-10-09 13:01     ` [PATCH 1/2] mempool: check driver enqueue result in one place Morten Brørup
2022-10-09 13:19   ` [PATCH v4] mempool: fix mempool cache flushing algorithm Andrew Rybchenko
2022-10-04 12:53 ` [PATCH v3] mempool: fix get objects from mempool with cache Andrew Rybchenko
2022-10-04 14:42   ` Morten Brørup
2022-10-07 10:44 ` [PATCH v4] " Andrew Rybchenko
2022-10-08 20:56   ` Thomas Monjalon
2022-10-11 20:30     ` Copy-pasted code should be updated Morten Brørup
2022-10-11 21:47       ` Honnappa Nagarahalli
2022-10-30  8:44         ` Morten Brørup
2022-10-30 22:50           ` Honnappa Nagarahalli
2022-10-14 14:01     ` [PATCH v4] mempool: fix get objects from mempool with cache Olivier Matz
2022-10-09 13:37 ` [PATCH v6 0/4] mempool: fix mempool cache flushing algorithm Andrew Rybchenko
2022-10-09 13:37   ` [PATCH v6 1/4] mempool: check driver enqueue result in one place Andrew Rybchenko
2022-10-09 13:37   ` [PATCH v6 2/4] mempool: avoid usage of term ring on put Andrew Rybchenko
2022-10-09 13:37   ` [PATCH v6 3/4] mempool: fix cache flushing algorithm Andrew Rybchenko
2022-10-09 14:31     ` Morten Brørup
2022-10-09 14:51       ` Andrew Rybchenko
2022-10-09 15:08         ` Morten Brørup
2022-10-14 14:01           ` Olivier Matz
2022-10-14 15:57             ` Morten Brørup
2022-10-14 19:50               ` Olivier Matz
2022-10-15  6:57                 ` Morten Brørup
2022-10-18 16:32                   ` Jerin Jacob
2022-10-09 13:37   ` [PATCH v6 4/4] mempool: flush cache completely on overflow Andrew Rybchenko
2022-10-09 14:44     ` Morten Brørup
2022-10-14 14:01       ` Olivier Matz
2022-10-10 15:21   ` [PATCH v6 0/4] mempool: fix mempool cache flushing algorithm Thomas Monjalon
2022-10-11 19:26     ` Morten Brørup
2022-10-26 14:09     ` Thomas Monjalon
2022-10-26 14:26       ` Morten Brørup
2022-10-26 14:44         ` [PATCH] mempool: cache align mempool cache objects Morten Brørup
2022-10-26 19:44           ` Andrew Rybchenko
2022-10-27  8:34           ` Olivier Matz [this message]
2022-10-27  9:22             ` Morten Brørup
2022-10-27 11:42               ` Olivier Matz
2022-10-27 12:11                 ` Morten Brørup
2022-10-27 15:20                   ` Olivier Matz
2022-10-28  6:35           ` [PATCH v3 1/2] " Morten Brørup
2022-10-28  6:35             ` [PATCH v3 2/2] mempool: optimized debug statistics Morten Brørup
2022-10-28  6:41           ` [PATCH v4 1/2] mempool: cache align mempool cache objects Morten Brørup
2022-10-28  6:41             ` [PATCH v4 2/2] mempool: optimized debug statistics Morten Brørup
2022-10-30  9:09               ` Morten Brørup
2022-10-30  9:16                 ` Thomas Monjalon
2022-10-30  9:17             ` [PATCH v4 1/2] mempool: cache align mempool cache objects Thomas Monjalon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y1pCovW8C2TbCq42@platinum \
    --to=olivier.matz@6wind.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=mb@smartsharesystems.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).