From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E8A41A0521; Thu, 23 Jul 2020 17:15:05 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 45E951BF94; Thu, 23 Jul 2020 17:15:05 +0200 (CEST) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by dpdk.org (Postfix) with ESMTP id 26DB41BF7B for ; Thu, 23 Jul 2020 17:15:03 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1595517303; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=aJ6PJaogEy34BOjthr5mP+EH8TqLGZryGTQ4RqeTmXc=; b=SkjYR/iMbkgjebIJuP42anUiZh6h7t5Qjl/Bsx+sp5hpSVtD3x1Yrc9U7tdLaj6FoOrXRZ wg0i0ulffElgnNYgL0n2ZNZ1ZnLE+Oj6EfNWTX20k0+1ymAlatecU6JSrHo9uYW7fHEhT4 5lfdDAhmRJaLYs7Alet7M830ZHwdfa4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-92-Vt9Yd6CsOwSZtHtWQGtMVQ-1; Thu, 23 Jul 2020 11:14:59 -0400 X-MC-Unique: Vt9Yd6CsOwSZtHtWQGtMVQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C2568108A7F8; Thu, 23 Jul 2020 15:14:45 +0000 (UTC) Received: from dhcp-25.97.bos.redhat.com (ovpn-117-20.rdu2.redhat.com [10.10.117.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 24B9A8BEE2; Thu, 23 Jul 2020 15:14:44 +0000 (UTC) From: Aaron Conole To: Phil Yang Cc: "maicolgabriel\@hotmail.com" , "dev\@dpdk.org" , "david.marchand\@redhat.com" , "drc\@linux.vnet.ibm.com" , Honnappa Nagarahalli , Ruifeng Wang , nd References: <1584936978-11899-1-git-send-email-phil.yang@arm.com> Date: Thu, 23 Jul 2020 11:14:37 -0400 In-Reply-To: (Phil Yang's message of "Thu, 23 Jul 2020 02:36:23 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=aconole@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain Subject: Re: [dpdk-dev] [PATCH 1/2] test/mcslock: move performance test to perf tests X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Phil Yang writes: > Hi Aaron, > > It seems Travis CI cannot capture this timeout issue. That's probably because we use -t 3 to multiply the timeout. We needed to do that because a few of the tests were on the cusp of failure in the Travis environment since it's a bit lower power. In more brawny systems, probably they aren't an issue. > But the local test records these two cases as TIMEOUT under the default timeout configuration. > > 1. The test results on x86 server (72 lcores online@ 2.60GHz) are: > a. Default setting: > $ sudo meson test -C build_test --suite DPDK:fast-tests > 3/92 DPDK:fast-tests / atomic_autotest TIMEOUT 10.32s > 39/92 DPDK:fast-tests / mcslock_autotest TIMEOUT 10.32s > > b. Extend timeout: > $ sudo meson test -C build_test --suite DPDK:fast-tests -t 30 > 3/92 DPDK:fast-tests / atomic_autotest OK 173.01s > 39/92 DPDK:fast-tests / mcslock_autotest OK 12.19s > > 2. The test results on aarch64 server (56 lcore online @ 2.0GHz) are: > a. Default setting: > $ sudo meson test -C build_test --suite DPDK:fast-tests > 3/92 DPDK:fast-tests / atomic_autotest TIMEOUT 10.47 s > 39/92 DPDK:fast-tests / mcslock_autotest TIMEOUT 10.47 s > > b. Extend timeout: > $ sudo meson test -C build_test --suite DPDK:fast-tests -t 60 > 3/92 DPDK:fast-tests / atomic_autotest OK 431.89 s > 39/92 DPDK:fast-tests / mcslock_autotest OK 21.25 s In all the cases it seems the mcslock_autotest is really not too far off, but the atomic_autotest looks really long no matter what. > So I think these two patches can resolve this issue. I'd love to hear your thoughts. In general, I'm always happy to see perf tests more appropriately put under perf area. I don't have much opinion on this specific patch, though. > Thanks, > Phil > > > >> Subject: [dpdk-dev] [PATCH 1/2] test/mcslock: move performance test to >> perf tests >> >> The MCS lock performance test takes more than 10 seconds and leads >> to meson test timeout on some platforms. Move the performance test >> into perf tests. >> >> Signed-off-by: Phil Yang >> Reviewed-by: Gavin Hu >> --- >> MAINTAINERS | 1 + >> app/test/Makefile | 1 + >> app/test/autotest_data.py | 6 +++ >> app/test/meson.build | 2 + >> app/test/test_mcslock.c | 88 ------------------------------- >> app/test/test_mcslock_perf.c | 121 >> +++++++++++++++++++++++++++++++++++++++++++ >> 6 files changed, 131 insertions(+), 88 deletions(-) >> create mode 100644 app/test/test_mcslock_perf.c >> >> diff --git a/MAINTAINERS b/MAINTAINERS >> index db235c2..411bdeb 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -247,6 +247,7 @@ MCSlock - EXPERIMENTAL >> M: Phil Yang >> F: lib/librte_eal/common/include/generic/rte_mcslock.h >> F: app/test/test_mcslock.c >> +F: app/test/test_mcslock_perf.c >> >> Ticketlock >> M: Joyce Kong >> diff --git a/app/test/Makefile b/app/test/Makefile >> index 1f080d1..97de3ac 100644 >> --- a/app/test/Makefile >> +++ b/app/test/Makefile >> @@ -65,6 +65,7 @@ SRCS-y += test_barrier.c >> SRCS-y += test_malloc.c >> SRCS-y += test_cycles.c >> SRCS-y += test_mcslock.c >> +SRCS-y += test_mcslock_perf.c >> SRCS-y += test_spinlock.c >> SRCS-y += test_ticketlock.c >> SRCS-y += test_memory.c >> diff --git a/app/test/autotest_data.py b/app/test/autotest_data.py >> index 7b1d013..2a4619d 100644 >> --- a/app/test/autotest_data.py >> +++ b/app/test/autotest_data.py >> @@ -784,6 +784,12 @@ >> "Func": default_autotest, >> "Report": None, >> }, >> + { >> + "Name": "MCS Lock performance autotest", >> + "Command": "mcslock_perf_autotest", >> + "Func": default_autotest, >> + "Report": None, >> + }, >> # >> # Please always make sure that ring_perf is the last test! >> # >> diff --git a/app/test/meson.build b/app/test/meson.build >> index 0a2ce71..335a869 100644 >> --- a/app/test/meson.build >> +++ b/app/test/meson.build >> @@ -82,6 +82,7 @@ test_sources = files('commands.c', >> 'test_meter.c', >> 'test_metrics.c', >> 'test_mcslock.c', >> + 'test_mcslock_perf.c', >> 'test_mp_secondary.c', >> 'test_per_lcore.c', >> 'test_pmd_perf.c', >> @@ -270,6 +271,7 @@ perf_test_names = [ >> 'rand_perf_autotest', >> 'hash_readwrite_perf_autotest', >> 'hash_readwrite_lf_perf_autotest', >> + 'mcslock_perf_autotest', >> ] >> >> driver_test_names = [ >> diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c >> index e9359df..15f9751 100644 >> --- a/app/test/test_mcslock.c >> +++ b/app/test/test_mcslock.c >> @@ -32,23 +32,16 @@ >> * >> * - The function takes the global lock, display something, then releases >> * the global lock on each core. >> - * >> - * - A load test is carried out, with all cores attempting to lock a single >> - * lock multiple times. >> */ >> >> RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_me); >> RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_try_me); >> -RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_perf_me); >> >> rte_mcslock_t *p_ml; >> rte_mcslock_t *p_ml_try; >> -rte_mcslock_t *p_ml_perf; >> >> static unsigned int count; >> >> -static rte_atomic32_t synchro; >> - >> static int >> test_mcslock_per_core(__attribute__((unused)) void *arg) >> { >> @@ -63,85 +56,8 @@ test_mcslock_per_core(__attribute__((unused)) void >> *arg) >> return 0; >> } >> >> -static uint64_t time_count[RTE_MAX_LCORE] = {0}; >> - >> #define MAX_LOOP 1000000 >> >> -static int >> -load_loop_fn(void *func_param) >> -{ >> - uint64_t time_diff = 0, begin; >> - uint64_t hz = rte_get_timer_hz(); >> - volatile uint64_t lcount = 0; >> - const int use_lock = *(int *)func_param; >> - const unsigned int lcore = rte_lcore_id(); >> - >> - /**< Per core me node. */ >> - rte_mcslock_t ml_perf_me = RTE_PER_LCORE(_ml_perf_me); >> - >> - /* wait synchro */ >> - while (rte_atomic32_read(&synchro) == 0) >> - ; >> - >> - begin = rte_get_timer_cycles(); >> - while (lcount < MAX_LOOP) { >> - if (use_lock) >> - rte_mcslock_lock(&p_ml_perf, &ml_perf_me); >> - >> - lcount++; >> - if (use_lock) >> - rte_mcslock_unlock(&p_ml_perf, &ml_perf_me); >> - } >> - time_diff = rte_get_timer_cycles() - begin; >> - time_count[lcore] = time_diff * 1000000 / hz; >> - return 0; >> -} >> - >> -static int >> -test_mcslock_perf(void) >> -{ >> - unsigned int i; >> - uint64_t total = 0; >> - int lock = 0; >> - const unsigned int lcore = rte_lcore_id(); >> - >> - printf("\nTest with no lock on single core...\n"); >> - rte_atomic32_set(&synchro, 1); >> - load_loop_fn(&lock); >> - printf("Core [%u] Cost Time = %"PRIu64" us\n", >> - lcore, time_count[lcore]); >> - memset(time_count, 0, sizeof(time_count)); >> - >> - printf("\nTest with lock on single core...\n"); >> - lock = 1; >> - rte_atomic32_set(&synchro, 1); >> - load_loop_fn(&lock); >> - printf("Core [%u] Cost Time = %"PRIu64" us\n", >> - lcore, time_count[lcore]); >> - memset(time_count, 0, sizeof(time_count)); >> - >> - printf("\nTest with lock on %u cores...\n", (rte_lcore_count())); >> - >> - rte_atomic32_set(&synchro, 0); >> - rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MASTER); >> - >> - /* start synchro and launch test on master */ >> - rte_atomic32_set(&synchro, 1); >> - load_loop_fn(&lock); >> - >> - rte_eal_mp_wait_lcore(); >> - >> - RTE_LCORE_FOREACH(i) { >> - printf("Core [%u] Cost Time = %"PRIu64" us\n", >> - i, time_count[i]); >> - total += time_count[i]; >> - } >> - >> - printf("Total Cost Time = %"PRIu64" us\n", total); >> - >> - return 0; >> -} >> - >> /* >> * Use rte_mcslock_trylock() to trylock a mcs lock object, >> * If it could not lock the object successfully, it would >> @@ -240,10 +156,6 @@ test_mcslock(void) >> ret = -1; >> rte_mcslock_unlock(&p_ml, &ml_me); >> >> - /* mcs lock perf test */ >> - if (test_mcslock_perf() < 0) >> - return -1; >> - >> return ret; >> } >> >> diff --git a/app/test/test_mcslock_perf.c b/app/test/test_mcslock_perf.c >> new file mode 100644 >> index 0000000..6948344 >> --- /dev/null >> +++ b/app/test/test_mcslock_perf.c >> @@ -0,0 +1,121 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2019 Arm Limited >> + */ >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#include "test.h" >> + >> +/* >> + * RTE MCS lock perf test >> + * ====================== >> + * >> + * These tests are derived from spin lock perf test cases. >> + * >> + * - A load test is carried out, with all cores attempting to lock a single >> + * lock multiple times. >> + */ >> + >> +RTE_DEFINE_PER_LCORE(rte_mcslock_t, _ml_perf_me); >> +rte_mcslock_t *p_ml_perf; >> + >> +static rte_atomic32_t synchro; >> +static uint64_t time_count[RTE_MAX_LCORE] = {0}; >> + >> +#define MAX_LOOP 1000000 >> + >> +static int >> +load_loop_fn(void *func_param) >> +{ >> + uint64_t time_diff = 0, begin; >> + uint64_t hz = rte_get_timer_hz(); >> + volatile uint64_t lcount = 0; >> + const int use_lock = *(int *)func_param; >> + const unsigned int lcore = rte_lcore_id(); >> + >> + /**< Per core me node. */ >> + rte_mcslock_t ml_perf_me = RTE_PER_LCORE(_ml_perf_me); >> + >> + /* wait synchro */ >> + while (rte_atomic32_read(&synchro) == 0) >> + ; >> + >> + begin = rte_get_timer_cycles(); >> + while (lcount < MAX_LOOP) { >> + if (use_lock) >> + rte_mcslock_lock(&p_ml_perf, &ml_perf_me); >> + >> + lcount++; >> + if (use_lock) >> + rte_mcslock_unlock(&p_ml_perf, &ml_perf_me); >> + } >> + time_diff = rte_get_timer_cycles() - begin; >> + time_count[lcore] = time_diff * 1000000 / hz; >> + return 0; >> +} >> + >> +/* >> + * Test rte_eal_get_lcore_state() in addition to mcs locks >> + * as we have "waiting" then "running" lcores. >> + */ >> +static int >> +test_mcslock_perf(void) >> +{ >> + unsigned int i; >> + uint64_t total = 0; >> + int lock = 0; >> + const unsigned int lcore = rte_lcore_id(); >> + >> + printf("\nTest with no lock on single core...\n"); >> + rte_atomic32_set(&synchro, 1); >> + load_loop_fn(&lock); >> + printf("Core [%u] Cost Time = %"PRIu64" us\n", >> + lcore, time_count[lcore]); >> + memset(time_count, 0, sizeof(time_count)); >> + >> + printf("\nTest with lock on single core...\n"); >> + lock = 1; >> + rte_atomic32_set(&synchro, 1); >> + load_loop_fn(&lock); >> + printf("Core [%u] Cost Time = %"PRIu64" us\n", >> + lcore, time_count[lcore]); >> + memset(time_count, 0, sizeof(time_count)); >> + >> + printf("\nTest with lock on %u cores...\n", (rte_lcore_count())); >> + >> + rte_atomic32_set(&synchro, 0); >> + rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MASTER); >> + >> + /* start synchro and launch test on master */ >> + rte_atomic32_set(&synchro, 1); >> + load_loop_fn(&lock); >> + >> + rte_eal_mp_wait_lcore(); >> + >> + RTE_LCORE_FOREACH(i) { >> + printf("Core [%u] Cost Time = %"PRIu64" us\n", >> + i, time_count[i]); >> + total += time_count[i]; >> + } >> + >> + printf("Total Cost Time = %"PRIu64" us\n", total); >> + >> + return 0; >> +} >> + >> +REGISTER_TEST_COMMAND(mcslock_perf_autotest, test_mcslock_perf); >> -- >> 2.7.4