Re: [dpdk-dev] [PATCH 1/2] test/mcslock: move performance test to perf tests

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Aaron Conole <aconole@redhat.com>
To: Phil Yang <Phil.Yang@arm.com>
Cc: "maicolgabriel\@hotmail.com" <maicolgabriel@hotmail.com>,
	"dev\@dpdk.org" <dev@dpdk.org>,
	"david.marchand\@redhat.com" <david.marchand@redhat.com>,
	"drc\@linux.vnet.ibm.com" <drc@linux.vnet.ibm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	Ruifeng Wang <Ruifeng.Wang@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [PATCH 1/2] test/mcslock: move performance test to perf	tests
Date: Thu, 23 Jul 2020 11:14:37 -0400	[thread overview]
Message-ID: <f7tmu3qfele.fsf@dhcp-25.97.bos.redhat.com> (raw)
In-Reply-To: <VE1PR08MB4640F1A5CCAD8C82D90F5CD1E9760@VE1PR08MB4640.eurprd08.prod.outlook.com> (Phil Yang's message of "Thu, 23 Jul 2020 02:36:23 +0000")

Phil Yang <Phil.Yang@arm.com> writes:

> Hi Aaron,
>
> It seems Travis CI cannot capture this timeout issue.

That's probably because we use -t 3 to multiply the timeout.  We needed
to do that because a few of the tests were on the cusp of failure in the
Travis environment since it's a bit lower power.  In more brawny
systems, probably they aren't an issue.

> But the local test records these two cases as TIMEOUT under the default timeout configuration.
>
> 1. The test results on x86 server (72 lcores online@ 2.60GHz) are:
> a. Default setting:
> $ sudo meson test -C build_test --suite DPDK:fast-tests
>    3/92 DPDK:fast-tests / atomic_autotest                TIMEOUT        10.32s
> 39/92 DPDK:fast-tests / mcslock_autotest               TIMEOUT        10.32s
>
> b. Extend timeout:
> $ sudo meson test -C build_test --suite DPDK:fast-tests -t 30
>    3/92 DPDK:fast-tests / atomic_autotest                OK             173.01s
> 39/92 DPDK:fast-tests / mcslock_autotest               OK             12.19s
>
> 2. The test results on aarch64 server (56 lcore online @ 2.0GHz) are:
> a. Default setting:
> $ sudo meson test -C build_test --suite DPDK:fast-tests
>    3/92 DPDK:fast-tests / atomic_autotest       TIMEOUT 10.47 s
> 39/92 DPDK:fast-tests / mcslock_autotest      TIMEOUT 10.47 s
>
> b. Extend timeout:
> $ sudo meson test -C build_test --suite DPDK:fast-tests -t 60
>    3/92 DPDK:fast-tests / atomic_autotest       OK      431.89 s
> 39/92 DPDK:fast-tests / mcslock_autotest      OK      21.25 s

In all the cases it seems the mcslock_autotest is really not too far
off, but the atomic_autotest looks really long no matter what.

> So I think these two patches can resolve this issue. I'd love to hear your thoughts.

In general, I'm always happy to see perf tests more appropriately put
under perf area.  I don't have much opinion on this specific patch,
though.

> Thanks,
> Phil
>
> <snip>
>
>> Subject: [dpdk-dev] [PATCH 1/2] test/mcslock: move performance test to
>> perf tests
>> 
>> The MCS lock performance test takes more than 10 seconds and leads
>> to meson test timeout on some platforms. Move the performance test
>> into perf tests.
>> 
>> Signed-off-by: Phil Yang <phil.yang@arm.com>
>> Reviewed-by: Gavin Hu <gavin.hu@arm.com>
>> ---
>>  MAINTAINERS                  |   1 +
>>  app/test/Makefile            |   1 +
>>  app/test/autotest_data.py    |   6 +++
>>  app/test/meson.build         |   2 +
>>  app/test/test_mcslock.c      |  88 -------------------------------
>>  app/test/test_mcslock_perf.c | 121
>> +++++++++++++++++++++++++++++++++++++++++++
>>  6 files changed, 131 insertions(+), 88 deletions(-)
>>  create mode 100644 app/test/test_mcslock_perf.c
>> 
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index db235c2..411bdeb 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -247,6 +247,7 @@ MCSlock - EXPERIMENTAL
>>  M: Phil Yang <phil.yang@arm.com>
>>  F: lib/librte_eal/common/include/generic/rte_mcslock.h
>>  F: app/test/test_mcslock.c
>> +F: app/test/test_mcslock_perf.c
>> 
>>  Ticketlock
>>  M: Joyce Kong <joyce.kong@arm.com>
>> diff --git a/app/test/Makefile b/app/test/Makefile
>> index 1f080d1..97de3ac 100644
>> --- a/app/test/Makefile
>> +++ b/app/test/Makefile
>> @@ -65,6 +65,7 @@ SRCS-y += test_barrier.c
>>  SRCS-y += test_malloc.c
>>  SRCS-y += test_cycles.c
>>  SRCS-y += test_mcslock.c
>> +SRCS-y += test_mcslock_perf.c
>>  SRCS-y += test_spinlock.c
>>  SRCS-y += test_ticketlock.c
>>  SRCS-y += test_memory.c
>> diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py
>> index 7b1d013..2a4619d 100644
>> --- a/app/test/autotest_data.py
>> +++ b/app/test/autotest_data.py
>> @@ -784,6 +784,12 @@
>>          "Func":    default_autotest,
>>          "Report":  None,
>>      },
>> +    {
>> +        "Name":    "MCS Lock performance autotest",
>> +        "Command": "mcslock_perf_autotest",
>> +        "Func":    default_autotest,
>> +        "Report":  None,
>> +    },
>>      #
>>      # Please always make sure that ring_perf is the last test!
>>      #
>> diff --git a/app/test/meson.build b/app/test/meson.build
>> index 0a2ce71..335a869 100644
>> --- a/app/test/meson.build
>> +++ b/app/test/meson.build
>> @@ -82,6 +82,7 @@ test_sources = files('commands.c',
>>  	'test_meter.c',
>>  	'test_metrics.c',
>>  	'test_mcslock.c',
>> +	'test_mcslock_perf.c',
>>  	'test_mp_secondary.c',
>>  	'test_per_lcore.c',
>>  	'test_pmd_perf.c',
>> @@ -270,6 +271,7 @@ perf_test_names = [
>>          'rand_perf_autotest',
>>          'hash_readwrite_perf_autotest',
>>          'hash_readwrite_lf_perf_autotest',
>> +	'mcslock_perf_autotest',
>>  ]
>> 
>>  driver_test_names = [
>> diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
>> index e9359df..15f9751 100644
>> --- a/app/test/test_mcslock.c
>> +++ b/app/test/test_mcslock.c
>> @@ -32,23 +32,16 @@
>>   *
>>   *   - The function takes the global lock, display something, then releases
>>   *     the global lock on each core.
>> - *
>> - * - A load test is carried out, with all cores attempting to lock a single
>> - *   lock multiple times.
>>   */
>> 
>>  RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_me);
>>  RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_try_me);
>> -RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_perf_me);
>> 
>>  rte_mcslock_t *p_ml;
>>  rte_mcslock_t *p_ml_try;
>> -rte_mcslock_t *p_ml_perf;
>> 
>>  static unsigned int count;
>> 
>> -static rte_atomic32_t synchro;
>> -
>>  static int
>>  test_mcslock_per_core(__attribute__((unused)) void *arg)
>>  {
>> @@ -63,85 +56,8 @@ test_mcslock_per_core(__attribute__((unused)) void
>> *arg)
>>  	return 0;
>>  }
>> 
>> -static uint64_t time_count[RTE_MAX_LCORE] = {0};
>> -
>>  #define MAX_LOOP 1000000
>> 
>> -static int
>> -load_loop_fn(void *func_param)
>> -{
>> -	uint64_t time_diff = 0, begin;
>> -	uint64_t hz = rte_get_timer_hz();
>> -	volatile uint64_t lcount = 0;
>> -	const int use_lock = *(int *)func_param;
>> -	const unsigned int lcore = rte_lcore_id();
>> -
>> -	/**< Per core me node. */
>> -	rte_mcslock_t ml_perf_me = RTE_PER_LCORE(_ml_perf_me);
>> -
>> -	/* wait synchro */
>> -	while (rte_atomic32_read(&synchro) == 0)
>> -		;
>> -
>> -	begin = rte_get_timer_cycles();
>> -	while (lcount < MAX_LOOP) {
>> -		if (use_lock)
>> -			rte_mcslock_lock(&p_ml_perf, &ml_perf_me);
>> -
>> -		lcount++;
>> -		if (use_lock)
>> -			rte_mcslock_unlock(&p_ml_perf, &ml_perf_me);
>> -	}
>> -	time_diff = rte_get_timer_cycles() - begin;
>> -	time_count[lcore] = time_diff * 1000000 / hz;
>> -	return 0;
>> -}
>> -
>> -static int
>> -test_mcslock_perf(void)
>> -{
>> -	unsigned int i;
>> -	uint64_t total = 0;
>> -	int lock = 0;
>> -	const unsigned int lcore = rte_lcore_id();
>> -
>> -	printf("\nTest with no lock on single core...\n");
>> -	rte_atomic32_set(&synchro, 1);
>> -	load_loop_fn(&lock);
>> -	printf("Core [%u] Cost Time = %"PRIu64" us\n",
>> -			lcore, time_count[lcore]);
>> -	memset(time_count, 0, sizeof(time_count));
>> -
>> -	printf("\nTest with lock on single core...\n");
>> -	lock = 1;
>> -	rte_atomic32_set(&synchro, 1);
>> -	load_loop_fn(&lock);
>> -	printf("Core [%u] Cost Time = %"PRIu64" us\n",
>> -			lcore, time_count[lcore]);
>> -	memset(time_count, 0, sizeof(time_count));
>> -
>> -	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
>> -
>> -	rte_atomic32_set(&synchro, 0);
>> -	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MASTER);
>> -
>> -	/* start synchro and launch test on master */
>> -	rte_atomic32_set(&synchro, 1);
>> -	load_loop_fn(&lock);
>> -
>> -	rte_eal_mp_wait_lcore();
>> -
>> -	RTE_LCORE_FOREACH(i) {
>> -		printf("Core [%u] Cost Time = %"PRIu64" us\n",
>> -				i, time_count[i]);
>> -		total += time_count[i];
>> -	}
>> -
>> -	printf("Total Cost Time = %"PRIu64" us\n", total);
>> -
>> -	return 0;
>> -}
>> -
>>  /*
>>   * Use rte_mcslock_trylock() to trylock a mcs lock object,
>>   * If it could not lock the object successfully, it would
>> @@ -240,10 +156,6 @@ test_mcslock(void)
>>  		ret = -1;
>>  	rte_mcslock_unlock(&p_ml, &ml_me);
>> 
>> -	/* mcs lock perf test */
>> -	if (test_mcslock_perf() < 0)
>> -		return -1;
>> -
>>  	return ret;
>>  }
>> 
>> diff --git a/app/test/test_mcslock_perf.c b/app/test/test_mcslock_perf.c
>> new file mode 100644
>> index 0000000..6948344
>> --- /dev/null
>> +++ b/app/test/test_mcslock_perf.c
>> @@ -0,0 +1,121 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2019 Arm Limited
>> + */
>> +
>> +#include <stdio.h>
>> +#include <stdint.h>
>> +#include <inttypes.h>
>> +#include <string.h>
>> +#include <unistd.h>
>> +#include <sys/queue.h>
>> +
>> +#include <rte_common.h>
>> +#include <rte_memory.h>
>> +#include <rte_per_lcore.h>
>> +#include <rte_launch.h>
>> +#include <rte_eal.h>
>> +#include <rte_lcore.h>
>> +#include <rte_cycles.h>
>> +#include <rte_mcslock.h>
>> +#include <rte_atomic.h>
>> +
>> +#include "test.h"
>> +
>> +/*
>> + * RTE MCS lock perf test
>> + * ======================
>> + *
>> + * These tests are derived from spin lock perf test cases.
>> + *
>> + * - A load test is carried out, with all cores attempting to lock a single
>> + *   lock multiple times.
>> + */
>> +
>> +RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_perf_me);
>> +rte_mcslock_t *p_ml_perf;
>> +
>> +static rte_atomic32_t synchro;
>> +static uint64_t time_count[RTE_MAX_LCORE] = {0};
>> +
>> +#define MAX_LOOP 1000000
>> +
>> +static int
>> +load_loop_fn(void *func_param)
>> +{
>> +	uint64_t time_diff = 0, begin;
>> +	uint64_t hz = rte_get_timer_hz();
>> +	volatile uint64_t lcount = 0;
>> +	const int use_lock = *(int *)func_param;
>> +	const unsigned int lcore = rte_lcore_id();
>> +
>> +	/**< Per core me node. */
>> +	rte_mcslock_t ml_perf_me = RTE_PER_LCORE(_ml_perf_me);
>> +
>> +	/* wait synchro */
>> +	while (rte_atomic32_read(&synchro) == 0)
>> +		;
>> +
>> +	begin = rte_get_timer_cycles();
>> +	while (lcount < MAX_LOOP) {
>> +		if (use_lock)
>> +			rte_mcslock_lock(&p_ml_perf, &ml_perf_me);
>> +
>> +		lcount++;
>> +		if (use_lock)
>> +			rte_mcslock_unlock(&p_ml_perf, &ml_perf_me);
>> +	}
>> +	time_diff = rte_get_timer_cycles() - begin;
>> +	time_count[lcore] = time_diff * 1000000 / hz;
>> +	return 0;
>> +}
>> +
>> +/*
>> + * Test rte_eal_get_lcore_state() in addition to mcs locks
>> + * as we have "waiting" then "running" lcores.
>> + */
>> +static int
>> +test_mcslock_perf(void)
>> +{
>> +	unsigned int i;
>> +	uint64_t total = 0;
>> +	int lock = 0;
>> +	const unsigned int lcore = rte_lcore_id();
>> +
>> +	printf("\nTest with no lock on single core...\n");
>> +	rte_atomic32_set(&synchro, 1);
>> +	load_loop_fn(&lock);
>> +	printf("Core [%u] Cost Time = %"PRIu64" us\n",
>> +			lcore, time_count[lcore]);
>> +	memset(time_count, 0, sizeof(time_count));
>> +
>> +	printf("\nTest with lock on single core...\n");
>> +	lock = 1;
>> +	rte_atomic32_set(&synchro, 1);
>> +	load_loop_fn(&lock);
>> +	printf("Core [%u] Cost Time = %"PRIu64" us\n",
>> +			lcore, time_count[lcore]);
>> +	memset(time_count, 0, sizeof(time_count));
>> +
>> +	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
>> +
>> +	rte_atomic32_set(&synchro, 0);
>> +	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MASTER);
>> +
>> +	/* start synchro and launch test on master */
>> +	rte_atomic32_set(&synchro, 1);
>> +	load_loop_fn(&lock);
>> +
>> +	rte_eal_mp_wait_lcore();
>> +
>> +	RTE_LCORE_FOREACH(i) {
>> +		printf("Core [%u] Cost Time = %"PRIu64" us\n",
>> +				i, time_count[i]);
>> +		total += time_count[i];
>> +	}
>> +
>> +	printf("Total Cost Time = %"PRIu64" us\n", total);
>> +
>> +	return 0;
>> +}
>> +
>> +REGISTER_TEST_COMMAND(mcslock_perf_autotest, test_mcslock_perf);
>> --
>> 2.7.4

next prev parent reply	other threads:[~2020-07-23 15:15 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-23  4:16 Phil Yang
2020-03-23  4:16 ` [dpdk-dev] [PATCH 2/2] test/atomic: reduce the number of loops to avoid timeouts Phil Yang
2020-07-28  3:41   ` [dpdk-dev] [PATCH v2] " Phil Yang
2020-09-17  9:17     ` Juraj Linkeš
2023-06-13  3:37     ` Stephen Hemminger
2020-07-23  2:36 ` [dpdk-dev] [PATCH 1/2] test/mcslock: move performance test to perf tests Phil Yang
2020-07-23 15:14   ` Aaron Conole [this message]
2020-07-28  3:24 ` [dpdk-dev] [PATCH v2] " Phil Yang
2020-09-17  9:16   ` Juraj Linkeš
2023-06-12 21:23 ` [dpdk-dev] [PATCH 1/2] " Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7tmu3qfele.fsf@dhcp-25.97.bos.redhat.com \
    --to=aconole@redhat.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=Phil.Yang@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=drc@linux.vnet.ibm.com \
    --cc=maicolgabriel@hotmail.com \
    --cc=nd@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).