From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E9855A00C2; Tue, 25 Jan 2022 13:56:38 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8D918426E8; Tue, 25 Jan 2022 13:56:38 +0100 (CET) Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by mails.dpdk.org (Postfix) with ESMTP id C768D426E4 for ; Tue, 25 Jan 2022 13:56:36 +0100 (CET) Received: by mail-wr1-f45.google.com with SMTP id r25so3041344wrc.12 for ; Tue, 25 Jan 2022 04:56:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=1KA4kQRN/ztQwe8/2PqgxXcn+in+M8dizaId2BNjWWM=; b=bRfXL9TgJsZfncjvwQTDOkjhe2MzZqtJSFMCat1ley09qiEssd5NmxvjnHuohd6wZb jLdb6kMWGKjoXK8NnUFty/1NJB8wKpJ0HFPouCY1/hr185EYsr1BpmkNhC9+wQbpNQf4 RYFRVy7QNaLoczpW/z2quZZwKzm31wHcQA073ebW2Fxrxm79vejk+irzZyxyxn1JZook neaTgHYNjTI52yaRsx44g1mhre66YxpV2yji1kbRWU+yCoSzlGu84aRcON1jvU+XB/R+ 0tsA9+8HMzar+P3CzVzcL+2HFElPAWSYNqHkFVR52X8IMdoNO4tNjyb3s+1mCQnlRYxX vmvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=1KA4kQRN/ztQwe8/2PqgxXcn+in+M8dizaId2BNjWWM=; b=RAFyd4xkYlISPmLiKhSwR+pV3WLPgVWf+8PQjqb9q4rrgijq2M8BnzPP0DKfZzS0DN di1jBgOMIDlVKDVVeYoD8evElG9z+sYFDpGZ6Kr+glr2rgXaLV3krGEXHFNxYk5OWuFd euskSrI+zX7CRAj8iEucG9d9JYbVlz1CTit62PpdLPetmJg2ATy7wlhee4rXc5vWc94o DPM3v6w5oHOGRNwiWk2euQQJ/RnHyh452wUhcpOEdke9zKS58IfO6JXmecwXOojULnpj 9rIj8h8Rjls8MBXgyD8hePT3U156BFOfpt20t14ampDW5l7TqZthFmZSm1OElQ8oeoh8 COaQ== X-Gm-Message-State: AOAM531yLEruvY7MqwfiIXzFpB4guKP0Cnzm5IUzG7exYsj61Du9M6rv L62fVnvZHmAi6RKQme/QtGdEow== X-Google-Smtp-Source: ABdhPJxj91665uZaQhtUmvrcaJjfvNR2yvOygs9VaHG7nzyNA+qEdCJT9iFmQq75rwRzj1/WSq+z6A== X-Received: by 2002:a05:6000:1091:: with SMTP id y17mr18143565wrw.310.1643115396424; Tue, 25 Jan 2022 04:56:36 -0800 (PST) Received: from 6wind.com ([2a01:e0a:5ac:6460:c065:401d:87eb:9b25]) by smtp.gmail.com with ESMTPSA id p14sm324794wmq.40.2022.01.25.04.56.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Jan 2022 04:56:35 -0800 (PST) Date: Tue, 25 Jan 2022 13:56:35 +0100 From: Olivier Matz To: Morten =?iso-8859-1?Q?Br=F8rup?= Cc: dev@dpdk.org, andrew.rybchenko@oktetlabs.ru, bruce.richardson@intel.com, jerinjacobk@gmail.com, thomas@monjalon.net Subject: Re: [PATCH v2] mempool: test performance with constant n Message-ID: References: <20220119113732.40167-1-mb@smartsharesystems.com> <20220124145953.14281-1-olivier.matz@6wind.com> <98CBD80474FA8B44BF855DF32C47DC35D86E29@smartserver.smartshare.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86E29@smartserver.smartshare.dk> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Morten, On Mon, Jan 24, 2022 at 06:20:49PM +0100, Morten Brørup wrote: > > From: Olivier Matz [mailto:olivier.matz@6wind.com] > > Sent: Monday, 24 January 2022 16.00 > > > > From: Morten Brørup > > > > "What gets measured gets done." > > > > This patch adds mempool performance tests where the number of objects > > to > > put and get is constant at compile time, which may significantly > > improve > > the performance of these functions. [*] > > > > Also, it is ensured that the array holding the object used for testing > > is cache line aligned, for maximum performance. > > > > And finally, the following entries are added to the list of tests: > > - Number of kept objects: 512 > > - Number of objects to get and to put: The number of pointers fitting > > into a cache line, i.e. 8 or 16 > > > > [*] Some example performance test (with cache) results: > > > > get_bulk=4 put_bulk=4 keep=128 constant_n=false rate_persec=280480972 > > get_bulk=4 put_bulk=4 keep=128 constant_n=true rate_persec=622159462 > > > > get_bulk=8 put_bulk=8 keep=128 constant_n=false rate_persec=477967155 > > get_bulk=8 put_bulk=8 keep=128 constant_n=true rate_persec=917582643 > > > > get_bulk=32 put_bulk=32 keep=32 constant_n=false rate_persec=871248691 > > get_bulk=32 put_bulk=32 keep=32 constant_n=true rate_persec=1134021836 > > > > Signed-off-by: Morten Brørup > > Signed-off-by: Olivier Matz > > --- > > > > Hi Morten, > > > > Here is the updated patch. > > > > I launched the mempool_perf on my desktop machine, but I don't > > reproduce the numbers: constant or > > non-constant give almost the same rate on my machine (it's even worst > > with constants). I tested with > > your initial patch and with this one. Can you please try this patch, > > and/or give some details about > > your test environment? > > Test environment: > VMware virtual machine running Ubuntu 20.04.3 LTS. > 4 CPUs and 8 GB RAM assigned. > The physical CPU is a Xeon E5-2620 v4 with plenty of RAM. > Although other VMs are running on the same server, it is not very oversubscribed. > > Hugepages established with: > usertools/dpdk-hugepages.py -p 2M --setup 2G > > Build steps: > meson -Dplatform=generic work > cd work > ninja > > > Here is what I get: > > > > with your patch: > > mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 > > constant_n=false rate_persec=152620236 > > mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 > > constant_n=true rate_persec=144716595 > > mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 > > constant_n=false rate_persec=306996838 > > mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 > > constant_n=true rate_persec=287375359 > > mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 > > n_keep=128 constant_n=false rate_persec=977626723 > > mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 > > n_keep=128 constant_n=true rate_persec=963103944 > > My test results were with an experimental, optimized version of the mempool library, which showed a larger difference. (This was the reason for updating the perf test - to measure the effects of optimizing the mempool library.) > > However, testing the patch (version 1) with a brand new git checkout still shows a huge difference, e.g.: > > mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=false rate_persec=501009612 > mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=true rate_persec=799014912 > > You should also see a significant difference when testing. > > My rate_persec without constant n is 3 x yours (501 M vs. 156 M ops/s), so the baseline seems wrong! I don't think our server rig is so much faster than your desktop machine. Perhaps mempool debug, telemetry or other background noise is polluting your test. Sorry, I just realized that I was indeed using a "debugoptimzed" build. It's much better in release mode. mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=0 rate_persec=1425473536 mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=1 rate_persec=2159660236 mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=0 rate_persec=2796342476 mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=1 rate_persec=4351577292 mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=0 rate_persec=8589777300 mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 n_keep=128 constant_n=1 rate_persec=13560971258 Thanks, Olivier > > > > > with this patch: > > mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 > > constant_n=0 rate_persec=156460646 > > mempool_autotest cache=512 cores=1 n_get_bulk=8 n_put_bulk=8 n_keep=128 > > constant_n=1 rate_persec=142173798 > > mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 > > constant_n=0 rate_persec=312410111 > > mempool_autotest cache=512 cores=2 n_get_bulk=8 n_put_bulk=8 n_keep=128 > > constant_n=1 rate_persec=281699942 > > mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 > > n_keep=128 constant_n=0 rate_persec=983315247 > > mempool_autotest cache=512 cores=12 n_get_bulk=8 n_put_bulk=8 > > n_keep=128 constant_n=1 rate_persec=950350638 > > > > > > v2: > > - use a flag instead of a negative value to enable tests with > > compile-time constant > > - use a static inline function instead of a macro > > - remove some "noise" (do not change variable type when not required) > > > > > > Thanks, > > Olivier > > > > > > app/test/test_mempool_perf.c | 110 ++++++++++++++++++++++++----------- > > 1 file changed, 77 insertions(+), 33 deletions(-) > > > > diff --git a/app/test/test_mempool_perf.c > > b/app/test/test_mempool_perf.c > > index 87ad251367..ce7c6241ab 100644 > > --- a/app/test/test_mempool_perf.c > > +++ b/app/test/test_mempool_perf.c > > @@ -1,5 +1,6 @@ > > /* SPDX-License-Identifier: BSD-3-Clause > > * Copyright(c) 2010-2014 Intel Corporation > > + * Copyright(c) 2022 SmartShare Systems > > */ > > > > #include > > @@ -55,19 +56,24 @@ > > * > > * - Bulk get from 1 to 32 > > * - Bulk put from 1 to 32 > > + * - Bulk get and put from 1 to 32, compile time constant > > * > > * - Number of kept objects (*n_keep*) > > * > > * - 32 > > * - 128 > > + * - 512 > > */ > > > > #define N 65536 > > #define TIME_S 5 > > #define MEMPOOL_ELT_SIZE 2048 > > -#define MAX_KEEP 128 > > +#define MAX_KEEP 512 > > #define MEMPOOL_SIZE > > ((rte_lcore_count()*(MAX_KEEP+RTE_MEMPOOL_CACHE_MAX_SIZE))-1) > > > > +/* Number of pointers fitting into one cache line. */ > > +#define CACHE_LINE_BURST (RTE_CACHE_LINE_SIZE / sizeof(uintptr_t)) > > + > > #define LOG_ERR() printf("test failed at %s():%d\n", __func__, > > __LINE__) > > #define RET_ERR() do { \ > > LOG_ERR(); \ > > @@ -91,6 +97,9 @@ static unsigned n_put_bulk; > > /* number of objects retrieved from mempool before putting them back > > */ > > static unsigned n_keep; > > > > +/* true if we want to test with constant n_get_bulk and n_put_bulk */ > > +static int use_constant_values; > > + > > /* number of enqueues / dequeues */ > > struct mempool_test_stats { > > uint64_t enq_count; > > @@ -111,11 +120,43 @@ my_obj_init(struct rte_mempool *mp, __rte_unused > > void *arg, > > *objnum = i; > > } > > > > +static __rte_always_inline int > > +test_loop(struct rte_mempool *mp, struct rte_mempool_cache *cache, > > + unsigned int x_keep, unsigned int x_get_bulk, unsigned int > > x_put_bulk) > > +{ > > + void *obj_table[MAX_KEEP] __rte_cache_aligned; > > + unsigned int idx; > > + unsigned int i; > > + int ret; > > + > > + for (i = 0; likely(i < (N / x_keep)); i++) { > > + /* get x_keep objects by bulk of x_get_bulk */ > > + for (idx = 0; idx < x_keep; idx += x_get_bulk) { > > + ret = rte_mempool_generic_get(mp, > > + &obj_table[idx], > > + x_get_bulk, > > + cache); > > + if (unlikely(ret < 0)) { > > + rte_mempool_dump(stdout, mp); > > + return ret; > > + } > > + } > > + > > + /* put the objects back by bulk of x_put_bulk */ > > + for (idx = 0; idx < x_keep; idx += x_put_bulk) { > > + rte_mempool_generic_put(mp, > > + &obj_table[idx], > > + x_put_bulk, > > + cache); > > + } > > + } > > + > > + return 0; > > +} > > + > > static int > > per_lcore_mempool_test(void *arg) > > { > > - void *obj_table[MAX_KEEP]; > > - unsigned i, idx; > > struct rte_mempool *mp = arg; > > unsigned lcore_id = rte_lcore_id(); > > int ret = 0; > > @@ -139,6 +180,9 @@ per_lcore_mempool_test(void *arg) > > GOTO_ERR(ret, out); > > if (((n_keep / n_put_bulk) * n_put_bulk) != n_keep) > > GOTO_ERR(ret, out); > > + /* for constant n, n_get_bulk and n_put_bulk must be the same */ > > + if (use_constant_values && n_put_bulk != n_get_bulk) > > + GOTO_ERR(ret, out); > > > > stats[lcore_id].enq_count = 0; > > > > @@ -149,31 +193,23 @@ per_lcore_mempool_test(void *arg) > > start_cycles = rte_get_timer_cycles(); > > > > while (time_diff/hz < TIME_S) { > > - for (i = 0; likely(i < (N/n_keep)); i++) { > > - /* get n_keep objects by bulk of n_bulk */ > > - idx = 0; > > - while (idx < n_keep) { > > - ret = rte_mempool_generic_get(mp, > > - &obj_table[idx], > > - n_get_bulk, > > - cache); > > - if (unlikely(ret < 0)) { > > - rte_mempool_dump(stdout, mp); > > - /* in this case, objects are lost... */ > > - GOTO_ERR(ret, out); > > - } > > - idx += n_get_bulk; > > - } > > + if (!use_constant_values) > > + ret = test_loop(mp, cache, n_keep, n_get_bulk, > > n_put_bulk); > > + else if (n_get_bulk == 1) > > + ret = test_loop(mp, cache, n_keep, 1, 1); > > + else if (n_get_bulk == 4) > > + ret = test_loop(mp, cache, n_keep, 4, 4); > > + else if (n_get_bulk == CACHE_LINE_BURST) > > + ret = test_loop(mp, cache, n_keep, > > + CACHE_LINE_BURST, CACHE_LINE_BURST); > > + else if (n_get_bulk == 32) > > + ret = test_loop(mp, cache, n_keep, 32, 32); > > + else > > + ret = -1; > > + > > + if (ret < 0) > > + GOTO_ERR(ret, out); > > > > - /* put the objects back */ > > - idx = 0; > > - while (idx < n_keep) { > > - rte_mempool_generic_put(mp, &obj_table[idx], > > - n_put_bulk, > > - cache); > > - idx += n_put_bulk; > > - } > > - } > > end_cycles = rte_get_timer_cycles(); > > time_diff = end_cycles - start_cycles; > > stats[lcore_id].enq_count += N; > > @@ -203,10 +239,10 @@ launch_cores(struct rte_mempool *mp, unsigned int > > cores) > > memset(stats, 0, sizeof(stats)); > > > > printf("mempool_autotest cache=%u cores=%u n_get_bulk=%u " > > - "n_put_bulk=%u n_keep=%u ", > > + "n_put_bulk=%u n_keep=%u constant_n=%u ", > > use_external_cache ? > > external_cache_size : (unsigned) mp->cache_size, > > - cores, n_get_bulk, n_put_bulk, n_keep); > > + cores, n_get_bulk, n_put_bulk, n_keep, > > use_constant_values); > > > > if (rte_mempool_avail_count(mp) != MEMPOOL_SIZE) { > > printf("mempool is not full\n"); > > @@ -253,9 +289,9 @@ launch_cores(struct rte_mempool *mp, unsigned int > > cores) > > static int > > do_one_mempool_test(struct rte_mempool *mp, unsigned int cores) > > { > > - unsigned bulk_tab_get[] = { 1, 4, 32, 0 }; > > - unsigned bulk_tab_put[] = { 1, 4, 32, 0 }; > > - unsigned keep_tab[] = { 32, 128, 0 }; > > + unsigned int bulk_tab_get[] = { 1, 4, CACHE_LINE_BURST, 32, 0 }; > > + unsigned int bulk_tab_put[] = { 1, 4, CACHE_LINE_BURST, 32, 0 }; > > + unsigned int keep_tab[] = { 32, 128, 512, 0 }; > > unsigned *get_bulk_ptr; > > unsigned *put_bulk_ptr; > > unsigned *keep_ptr; > > @@ -265,13 +301,21 @@ do_one_mempool_test(struct rte_mempool *mp, > > unsigned int cores) > > for (put_bulk_ptr = bulk_tab_put; *put_bulk_ptr; > > put_bulk_ptr++) { > > for (keep_ptr = keep_tab; *keep_ptr; keep_ptr++) { > > > > + use_constant_values = 0; > > n_get_bulk = *get_bulk_ptr; > > n_put_bulk = *put_bulk_ptr; > > n_keep = *keep_ptr; > > ret = launch_cores(mp, cores); > > - > > if (ret < 0) > > return -1; > > + > > + /* replay test with constant values */ > > + if (n_get_bulk == n_put_bulk) { > > + use_constant_values = 1; > > + ret = launch_cores(mp, cores); > > + if (ret < 0) > > + return -1; > > + } > > } > > } > > } > > -- > > 2.30.2 >