DPDK patches and discussions
 help / color / mirror / Atom feed
From: Dharmik Thakkar <dharmik.thakkar@arm.com>
Cc: dev@dpdk.org, nd@arm.com, honnappa.nagarahalli@arm.com,
	ruifeng.wang@arm.com, Dharmik Thakkar <dharmik.thakkar@arm.com>
Subject: [PATCH v2 0/1] mempool: implement index-based per core cache
Date: Wed, 12 Jan 2022 23:36:29 -0600
Message-ID: <20220113053630.886638-1-dharmik.thakkar@arm.com> (raw)
In-Reply-To: <20211224225923.806498-1-dharmik.thakkar@arm.com>

Current mempool per core cache implementation stores pointers to mbufs
On 64b architectures, each pointer consumes 8B
This patch replaces it with index-based implementation,
where in each buffer is addressed by (pool base address + index)
It reduces the amount of memory/cache required for per core cache

L3Fwd performance testing reveals minor improvements in the cache
performance (L1 and L2 misses reduced by 0.60%)
with no change in throughput

Micro-benchmarking the patch using mempool_perf_test shows
significant improvement with majority of the test cases

Number of cores = 1:
n_get_bulk=1 n_put_bulk=1 n_keep=32 %_change_with_patch=18.01
n_get_bulk=1 n_put_bulk=1 n_keep=128 %_change_with_patch=19.91
n_get_bulk=1 n_put_bulk=4 n_keep=32 %_change_with_patch=-20.37 (regression)
n_get_bulk=1 n_put_bulk=4 n_keep=128 %_change_with_patch=-17.01 (regression) 
n_get_bulk=1 n_put_bulk=32 n_keep=32 %_change_with_patch=-25.06 (regression)
n_get_bulk=1 n_put_bulk=32 n_keep=128 %_change_with_patch=-23.81 (regression)
n_get_bulk=4 n_put_bulk=1 n_keep=32 %_change_with_patch=53.93
n_get_bulk=4 n_put_bulk=1 n_keep=128 %_change_with_patch=60.90
n_get_bulk=4 n_put_bulk=4 n_keep=32 %_change_with_patch=1.64
n_get_bulk=4 n_put_bulk=4 n_keep=128 %_change_with_patch=8.76
n_get_bulk=4 n_put_bulk=32 n_keep=32 %_change_with_patch=-4.71 (regression)
n_get_bulk=4 n_put_bulk=32 n_keep=128 %_change_with_patch=-3.19 (regression)
n_get_bulk=32 n_put_bulk=1 n_keep=32 %_change_with_patch=65.63
n_get_bulk=32 n_put_bulk=1 n_keep=128 %_change_with_patch=75.19
n_get_bulk=32 n_put_bulk=4 n_keep=32 %_change_with_patch=11.75
n_get_bulk=32 n_put_bulk=4 n_keep=128 %_change_with_patch=15.52
n_get_bulk=32 n_put_bulk=32 n_keep=32 %_change_with_patch=13.45
n_get_bulk=32 n_put_bulk=32 n_keep=128 %_change_with_patch=11.58

Number of core = 2:
n_get_bulk=1 n_put_bulk=1 n_keep=32 %_change_with_patch=18.21
n_get_bulk=1 n_put_bulk=1 n_keep=128 %_change_with_patch=21.89
n_get_bulk=1 n_put_bulk=4 n_keep=32 %_change_with_patch=-21.21 (regression)
n_get_bulk=1 n_put_bulk=4 n_keep=128 %_change_with_patch=-17.05 (regression)
n_get_bulk=1 n_put_bulk=32 n_keep=32 %_change_with_patch=-26.09 (regression)
n_get_bulk=1 n_put_bulk=32 n_keep=128 %_change_with_patch=-23.49 (regression)
n_get_bulk=4 n_put_bulk=1 n_keep=32 %_change_with_patch=56.28
n_get_bulk=4 n_put_bulk=1 n_keep=128 %_change_with_patch=67.69
n_get_bulk=4 n_put_bulk=4 n_keep=32 %_change_with_patch=1.45
n_get_bulk=4 n_put_bulk=4 n_keep=128 %_change_with_patch=8.84
n_get_bulk=4 n_put_bulk=32 n_keep=32 %_change_with_patch=-5.27 (regression)
n_get_bulk=4 n_put_bulk=32 n_keep=128 %_change_with_patch=-3.09 (regression)
n_get_bulk=32 n_put_bulk=1 n_keep=32 %_change_with_patch=76.11
n_get_bulk=32 n_put_bulk=1 n_keep=128 %_change_with_patch=86.06
n_get_bulk=32 n_put_bulk=4 n_keep=32 %_change_with_patch=11.86
n_get_bulk=32 n_put_bulk=4 n_keep=128 %_change_with_patch=16.55
n_get_bulk=32 n_put_bulk=32 n_keep=32 %_change_with_patch=13.01
n_get_bulk=32 n_put_bulk=32 n_keep=128 %_change_with_patch=11.51


From analyzing the results, it is clear that for n_get_bulk and
n_put_bulk sizes of 32 there is no performance regression
IMO, the other sizes are not practical from performance perspective
and the regression in those cases can be safely ignored

An attempt to increase the size of mempool to 32GB, by dividing the
index by sizeof(uintptr_t), has led to a performance degradation of
~5% compared to the base performance
---
v2:
 - Increase size of mempool to 32GB (Morten)
 - Improve performance for other platforms using dual loop unrolling
---
Dharmik Thakkar (1):
  mempool: implement index-based per core cache

 lib/mempool/rte_mempool.h             | 150 +++++++++++++++++++++++++-
 lib/mempool/rte_mempool_ops_default.c |   7 ++
 2 files changed, 156 insertions(+), 1 deletion(-)

-- 
2.17.1


  parent reply	other threads:[~2022-01-13  5:36 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-30 17:27 [dpdk-dev] [RFC] " Dharmik Thakkar
2021-10-01 12:36 ` Jerin Jacob
2021-10-01 15:44   ` Honnappa Nagarahalli
2021-10-01 17:32     ` Jerin Jacob
2021-10-01 17:57       ` Honnappa Nagarahalli
2021-10-01 18:21       ` Jerin Jacob
2021-10-01 21:30 ` Ananyev, Konstantin
2021-10-02  0:07   ` Honnappa Nagarahalli
2021-10-02 18:51     ` Ananyev, Konstantin
2021-10-04 16:36       ` Honnappa Nagarahalli
2021-10-30 10:23         ` Morten Brørup
2021-10-31  8:14         ` Morten Brørup
2021-11-03 15:12           ` Dharmik Thakkar
2021-11-03 15:52             ` Morten Brørup
2021-11-04  4:42               ` Dharmik Thakkar
2021-11-04  8:04                 ` Morten Brørup
2021-11-08  4:32                   ` Honnappa Nagarahalli
2021-11-08  7:22                     ` Morten Brørup
2021-11-08 15:29                       ` Honnappa Nagarahalli
2021-11-08 15:39                         ` Morten Brørup
2021-11-08 15:46                           ` Honnappa Nagarahalli
2021-11-08 16:03                             ` Morten Brørup
2021-11-08 16:47                               ` Jerin Jacob
2021-12-24 22:59 ` [PATCH 0/1] " Dharmik Thakkar
2021-12-24 22:59   ` [PATCH 1/1] " Dharmik Thakkar
2022-01-11  2:26     ` Ananyev, Konstantin
2022-01-13  5:17       ` Dharmik Thakkar
2022-01-13 10:37         ` Ananyev, Konstantin
2022-01-19 15:32           ` Dharmik Thakkar
2022-01-21 11:25             ` Ananyev, Konstantin
2022-01-21 11:31               ` Ananyev, Konstantin
2022-03-24 19:51               ` Dharmik Thakkar
2021-12-25  0:16   ` [PATCH 0/1] " Morten Brørup
2022-01-07 11:15     ` Bruce Richardson
2022-01-07 11:29       ` Morten Brørup
2022-01-07 13:50         ` Bruce Richardson
2022-01-08  9:37           ` Morten Brørup
2022-01-10  6:38             ` Jerin Jacob
2022-01-13  5:31               ` Dharmik Thakkar
2022-01-13  5:36   ` Dharmik Thakkar [this message]
2022-01-13  5:36     ` [PATCH v2 1/1] " Dharmik Thakkar
2022-01-13 10:18       ` Jerin Jacob
2022-01-20  8:21       ` Morten Brørup
2022-01-21  6:01         ` Honnappa Nagarahalli
2022-01-21  7:36           ` Morten Brørup
2022-01-24 13:05             ` Ray Kinsella
2022-01-21  9:12           ` Bruce Richardson
2022-01-23  7:13       ` Wang, Haiyue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220113053630.886638-1-dharmik.thakkar@arm.com \
    --to=dharmik.thakkar@arm.com \
    --cc=dev@dpdk.org \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=nd@arm.com \
    --cc=ruifeng.wang@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

DPDK patches and discussions

This inbox may be cloned and mirrored by anyone:

	git clone --mirror http://inbox.dpdk.org/dev/0 dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 dev dev/ http://inbox.dpdk.org/dev \
		dev@dpdk.org
	public-inbox-index dev

Example config snippet for mirrors.
Newsgroup available over NNTP:
	nntp://inbox.dpdk.org/inbox.dpdk.dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git