From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 471DDA0542; Tue, 6 Sep 2022 18:17:00 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A6B0F42802; Tue, 6 Sep 2022 18:16:39 +0200 (CEST) Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10086.outbound.protection.outlook.com [40.107.1.86]) by mails.dpdk.org (Postfix) with ESMTP id 75AB1427F6 for ; Tue, 6 Sep 2022 18:16:37 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=P5z9rlUrfmJ8Ki/tsZhmz2mRpaIPNG+1CL4SKIlWrzFMNpzupWLZp6dSSFw6T2kJAwD43yWfVFu+r4cFAmCK62exS89Sit9ZV3paT2Mw5BM349qbqHbEq5G7y0Y2vC3FtBsI+UbdRNUNybAvwMYLvCVxkEdPF88P+l2nqARzAfCdRVfeehU8PEsMhkOpR0pdzLoFcubBl3a0opasTAAd2FGJCuxIctBxc3ZDql9bI1amziMdwMSvWReKJf8wTGIGlzxrtp+OiF76AtpObLyNTpfghWIz7vUBt0vz4pqMUjqnlxJ67xPCV9dnJlIs3aWXvUOW0n+9uiuf0s5FIL2jmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=//BhbDRrntohIw7MqBub0bYndNGR7WBhzavy+EzyzcM=; b=VFB8iEP1Ggxe4ByWLnhIgjDtvHcyTzi6YC3FM2iGPvg9lsZQt+0OUaWWE0qPRPLeX8u+esNz6E51K7a85U39DqoOJ2ForPu4Ww0xCMPuckNJ5HdzUQtQ0AJLRiJPKzEe8apkET4S7XK7Z5O8uDLpUKI5W8aVh1K8sxIdD54E6VeJ6ACdzRCMu2QXJ+nHHiAnXrJ12lehODWsy1YZr6rHqUHwi9cH/GNMN87NeQW9v2FMQ1Dwrjfuca+nciBswKEyZQ7+8ja2GQNdrEqrx06Y4IOxx8EqSyrE4LIGy3B9QtOEYx4cwDIhOMSctrotMrY8XD0BwDFl72LBlE/E+fX+Hg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 192.176.1.74) smtp.rcpttodomain=arm.com smtp.mailfrom=ericsson.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=ericsson.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=//BhbDRrntohIw7MqBub0bYndNGR7WBhzavy+EzyzcM=; b=RF4YVATAxsx810apRJeNEyy6+qBVXPWHW1nTpVTJhHa/d9WIwwU4JuV5PEtTj6UYCdlpiF5SrJEyvhK/xnRqpioW5BYmdrRcI1WLhh/Y6rwhJT/aZoPmUP7sRzDrP2Ak3T78HVbYLZgF5Xt8c2OzqfOXEpjJpwcpQNhdoPe+WmA= Received: from OS6P279CA0107.NORP279.PROD.OUTLOOK.COM (2603:10a6:e10:3c::17) by AS8PR07MB8154.eurprd07.prod.outlook.com (2603:10a6:20b:375::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5612.12; Tue, 6 Sep 2022 16:16:36 +0000 Received: from HE1EUR02FT070.eop-EUR02.prod.protection.outlook.com (2603:10a6:e10:3c:cafe::1a) by OS6P279CA0107.outlook.office365.com (2603:10a6:e10:3c::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5588.11 via Frontend Transport; Tue, 6 Sep 2022 16:16:35 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 192.176.1.74) smtp.mailfrom=ericsson.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=ericsson.com; Received-SPF: Pass (protection.outlook.com: domain of ericsson.com designates 192.176.1.74 as permitted sender) receiver=protection.outlook.com; client-ip=192.176.1.74; helo=oa.msg.ericsson.com; pr=C Received: from oa.msg.ericsson.com (192.176.1.74) by HE1EUR02FT070.mail.protection.outlook.com (10.152.11.44) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.20.5588.10 via Frontend Transport; Tue, 6 Sep 2022 16:16:35 +0000 Received: from ESESSMB504.ericsson.se (153.88.183.165) by ESESSMR501.ericsson.se (153.88.183.108) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2375.31; Tue, 6 Sep 2022 18:16:27 +0200 Received: from ESESBMB501.ericsson.se (153.88.183.168) by ESESSMB504.ericsson.se (153.88.183.54) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2375.31; Tue, 6 Sep 2022 18:16:27 +0200 Received: from seliicinfr00049.seli.gic.ericsson.se (153.88.183.153) by smtp.internal.ericsson.com (153.88.183.184) with Microsoft SMTP Server id 15.1.2375.31 via Frontend Transport; Tue, 6 Sep 2022 18:16:27 +0200 Received: from localhost.localdomain (seliicwb00002.seli.gic.ericsson.se [10.156.25.100]) by seliicinfr00049.seli.gic.ericsson.se (Postfix) with ESMTP id 62A0F380061; Tue, 6 Sep 2022 18:16:27 +0200 (CEST) From: =?UTF-8?q?Mattias=20R=C3=B6nnblom?= To: Van@dpdk.org, Haaren@dpdk.org, Harry CC: , Honnappa Nagarahalli , =?UTF-8?q?Morten=20Br=C3=B8rup?= , nd , =?UTF-8?q?Mattias=20R=C3=B6nnblom?= Subject: [PATCH 1/6] service: reduce statistics overhead for parallel services Date: Tue, 6 Sep 2022 18:13:47 +0200 Message-ID: <20220906161352.296110-1-mattias.ronnblom@ericsson.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20220708125645.3141464-2-harry.van.haaren@intel.com> References: <20220708125645.3141464-2-harry.van.haaren@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 0dbdb4ac-dc69-49e0-a854-08da90232b82 X-MS-TrafficTypeDiagnostic: AS8PR07MB8154:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: EK8ttasoVYkx5UB3RVffY+9cTMu455JjUQ7pEALXS3zqL+sLOCf3dPm1Qf4iT0J5FJb/xdw2DTKragjDXAtHzLtYrLXC8SkBMEGolqIepJgKKWK7QV+tAblnDNRm7u79VYE4EqsDRVHjoOCG+m4kFEYFB97biDm6gVsnAB3+3hwbtHZYq3PADM0PyIze/0Ko7QLNpXypcYot/Z8otsK19vl/LGCIhF7FIk114iPNmnuXROZdFRUSbGHQ1GpKwhoYG4dJxlRg0q/LmWvbj5/lKrvT15W4913c3ZF4mm6K0q6vYoPgrnZYnun67TtU/LWPeZUUv84EdoXYaUKYXIAniQDzm466KPIQd0fnBvQnzrthJVTOrw1p7aCCsbLOGTfSPMj+uXsM25TBJ10zzoUpvWuO7PG4dKnCUjlEK8FuuAuR9K/W68zfy3j8zIwlYXyy3SKTLAROnYNMUBKd3NpaGwcyCoFOKSn5h6Ohdc61qmgCTcVr0MYtniE2rJbhtBH++B62VCynMIWfWTeqTXt8mvQCQU1YF+Ji//8i6yE6Z40bxH2x92erkGfr+vDFPlAkBt4+RCrC9mXeoQNdajPLyp6oAxrFqTiRJseVZXyCpra+c9wxYItQQ4BVfDTMjkLWdebkxhxfmeeH+vm3ePdQYdvODBYh2n3W2eG+h+ZbPoghLKf3551UudB925wG7eFDVPj/0z9WEd+/bQEsG6fZlh97YyB5IjreEQ9vYTOhnJFqoFhL2+TNq37Zzmxym/r6HsnEF5Hop2lpuLscYryJh2F0PC8DbHcjx81bncpnhm2MVX0BYNeslyl9S6+R5lNi X-Forefront-Antispam-Report: CIP:192.176.1.74; CTRY:SE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:oa.msg.ericsson.com; PTR:office365.se.ericsson.net; CAT:NONE; SFS:(13230016)(4636009)(39860400002)(136003)(396003)(346002)(376002)(40470700004)(36840700001)(46966006)(316002)(40480700001)(82310400005)(70206006)(70586007)(4326008)(5660300002)(8676002)(30864003)(41300700001)(6916009)(6666004)(40460700003)(2906002)(6266002)(54906003)(36756003)(8936002)(26005)(86362001)(478600001)(36860700001)(107886003)(1076003)(186003)(2616005)(336012)(66574015)(47076005)(82960400001)(356005)(7636003)(83380400001)(82740400003); DIR:OUT; SFP:1101; X-OriginatorOrg: ericsson.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR07MB8154 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Move the statistics from the service data structure to the per-lcore struct. This eliminates contention for the counter cache lines, which decreases the producer-side statistics overhead for services deployed across many lcores. Prior to this patch, enabling statistics for a service with a per-service function call latency of 1000 clock cycles deployed across 16 cores on a Intel Xeon 6230N @ 2,3 GHz would incur a cost of ~10000 core clock cycles per service call. After this patch, the statistics overhead is reduce to 22 clock cycles per call. Signed-off-by: Mattias Rönnblom --- lib/eal/common/rte_service.c | 182 +++++++++++++++++++++++------------ 1 file changed, 121 insertions(+), 61 deletions(-) diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c index 94cb056196..b5103f2a20 100644 --- a/lib/eal/common/rte_service.c +++ b/lib/eal/common/rte_service.c @@ -50,17 +50,8 @@ struct rte_service_spec_impl { * on currently. */ uint32_t num_mapped_cores; - - /* 32-bit builds won't naturally align a uint64_t, so force alignment, - * allowing regular reads to be atomic. - */ - uint64_t calls __rte_aligned(8); - uint64_t cycles_spent __rte_aligned(8); } __rte_cache_aligned; -/* Mask used to ensure uint64_t 8 byte vars are naturally aligned. */ -#define RTE_SERVICE_STAT_ALIGN_MASK (8 - 1) - /* the internal values of a service core */ struct core_state { /* map of services IDs are run on this core */ @@ -71,6 +62,7 @@ struct core_state { uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX]; uint64_t loops; uint64_t calls_per_service[RTE_SERVICE_NUM_MAX]; + uint64_t cycles_per_service[RTE_SERVICE_NUM_MAX]; } __rte_cache_aligned; static uint32_t rte_service_count; @@ -138,13 +130,16 @@ rte_service_finalize(void) rte_service_library_initialized = 0; } -/* returns 1 if service is registered and has not been unregistered - * Returns 0 if service never registered, or has been unregistered - */ -static inline int +static inline bool +service_registered(uint32_t id) +{ + return rte_services[id].internal_flags & SERVICE_F_REGISTERED; +} + +static inline bool service_valid(uint32_t id) { - return !!(rte_services[id].internal_flags & SERVICE_F_REGISTERED); + return id < RTE_SERVICE_NUM_MAX && service_registered(id); } static struct rte_service_spec_impl * @@ -155,7 +150,7 @@ service_get(uint32_t id) /* validate ID and retrieve service pointer, or return error value */ #define SERVICE_VALID_GET_OR_ERR_RET(id, service, retval) do { \ - if (id >= RTE_SERVICE_NUM_MAX || !service_valid(id)) \ + if (!service_valid(id)) \ return retval; \ service = &rte_services[id]; \ } while (0) @@ -217,7 +212,7 @@ rte_service_get_by_name(const char *name, uint32_t *service_id) int i; for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) { - if (service_valid(i) && + if (service_registered(i) && strcmp(name, rte_services[i].spec.name) == 0) { *service_id = i; return 0; @@ -254,7 +249,7 @@ rte_service_component_register(const struct rte_service_spec *spec, return -EINVAL; for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) { - if (!service_valid(i)) { + if (!service_registered(i)) { free_slot = i; break; } @@ -366,29 +361,24 @@ service_runner_do_callback(struct rte_service_spec_impl *s, { void *userdata = s->spec.callback_userdata; - /* Ensure the atomically stored variables are naturally aligned, - * as required for regular loads to be atomic. - */ - RTE_BUILD_BUG_ON((offsetof(struct rte_service_spec_impl, calls) - & RTE_SERVICE_STAT_ALIGN_MASK) != 0); - RTE_BUILD_BUG_ON((offsetof(struct rte_service_spec_impl, cycles_spent) - & RTE_SERVICE_STAT_ALIGN_MASK) != 0); - if (service_stats_enabled(s)) { uint64_t start = rte_rdtsc(); s->spec.callback(userdata); uint64_t end = rte_rdtsc(); uint64_t cycles = end - start; - cs->calls_per_service[service_idx]++; - if (service_mt_safe(s)) { - __atomic_fetch_add(&s->cycles_spent, cycles, __ATOMIC_RELAXED); - __atomic_fetch_add(&s->calls, 1, __ATOMIC_RELAXED); - } else { - uint64_t cycles_new = s->cycles_spent + cycles; - uint64_t calls_new = s->calls++; - __atomic_store_n(&s->cycles_spent, cycles_new, __ATOMIC_RELAXED); - __atomic_store_n(&s->calls, calls_new, __ATOMIC_RELAXED); - } + + /* The lcore service worker thread is the only writer, + * and thus only a non-atomic load and an atomic store + * is needed, and not the more expensive atomic + * add. + */ + __atomic_store_n(&cs->calls_per_service[service_idx], + cs->calls_per_service[service_idx] + 1, + __ATOMIC_RELAXED); + + __atomic_store_n(&cs->cycles_per_service[service_idx], + cs->cycles_per_service[service_idx] + cycles, + __ATOMIC_RELAXED); } else s->spec.callback(userdata); } @@ -436,7 +426,7 @@ rte_service_may_be_active(uint32_t id) int32_t lcore_count = rte_service_lcore_list(ids, RTE_MAX_LCORE); int i; - if (id >= RTE_SERVICE_NUM_MAX || !service_valid(id)) + if (!service_valid(id)) return -EINVAL; for (i = 0; i < lcore_count; i++) { @@ -483,16 +473,17 @@ service_runner_func(void *arg) */ while (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) == RUNSTATE_RUNNING) { + const uint64_t service_mask = cs->service_mask; for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) { - if (!service_valid(i)) + if (!service_registered(i)) continue; /* return value ignored as no change to code flow */ service_run(i, cs, service_mask, service_get(i), 1); } - cs->loops++; + __atomic_store_n(&cs->loops, cs->loops + 1, __ATOMIC_RELAXED); } /* Use SEQ CST memory ordering to avoid any re-ordering around @@ -608,8 +599,8 @@ static int32_t service_update(uint32_t sid, uint32_t lcore, uint32_t *set, uint32_t *enabled) { /* validate ID, or return error value */ - if (sid >= RTE_SERVICE_NUM_MAX || !service_valid(sid) || - lcore >= RTE_MAX_LCORE || !lcore_states[lcore].is_service_core) + if (!service_valid(sid) || lcore >= RTE_MAX_LCORE || + !lcore_states[lcore].is_service_core) return -EINVAL; uint64_t sid_mask = UINT64_C(1) << sid; @@ -813,21 +804,75 @@ rte_service_lcore_stop(uint32_t lcore) return 0; } +static uint64_t +lcore_attr_get_loops(unsigned int lcore) +{ + struct core_state *cs = &lcore_states[lcore]; + + return __atomic_load_n(&cs->loops, __ATOMIC_RELAXED); +} + +static uint64_t +lcore_attr_get_service_calls(uint32_t service_id, unsigned int lcore) +{ + struct core_state *cs = &lcore_states[lcore]; + + return __atomic_load_n(&cs->calls_per_service[service_id], + __ATOMIC_RELAXED); +} + +static uint64_t +lcore_attr_get_service_cycles(uint32_t service_id, unsigned int lcore) +{ + struct core_state *cs = &lcore_states[lcore]; + + return __atomic_load_n(&cs->cycles_per_service[service_id], + __ATOMIC_RELAXED); +} + +typedef uint64_t (*lcore_attr_get_fun)(uint32_t service_id, + unsigned int lcore); + +static uint64_t +attr_get(uint32_t id, lcore_attr_get_fun lcore_attr_get) +{ + unsigned int lcore; + uint64_t sum = 0; + + for (lcore = 0; lcore < RTE_MAX_LCORE; lcore++) + if (lcore_states[lcore].is_service_core) + sum += lcore_attr_get(id, lcore); + + return sum; +} + +static uint64_t +attr_get_service_calls(uint32_t service_id) +{ + return attr_get(service_id, lcore_attr_get_service_calls); +} + +static uint64_t +attr_get_service_cycles(uint32_t service_id) +{ + return attr_get(service_id, lcore_attr_get_service_cycles); +} + int32_t rte_service_attr_get(uint32_t id, uint32_t attr_id, uint64_t *attr_value) { - struct rte_service_spec_impl *s; - SERVICE_VALID_GET_OR_ERR_RET(id, s, -EINVAL); + if (!service_valid(id)) + return -EINVAL; if (!attr_value) return -EINVAL; switch (attr_id) { - case RTE_SERVICE_ATTR_CYCLES: - *attr_value = s->cycles_spent; - return 0; case RTE_SERVICE_ATTR_CALL_COUNT: - *attr_value = s->calls; + *attr_value = attr_get_service_calls(id); + return 0; + case RTE_SERVICE_ATTR_CYCLES: + *attr_value = attr_get_service_cycles(id); return 0; default: return -EINVAL; @@ -849,7 +894,7 @@ rte_service_lcore_attr_get(uint32_t lcore, uint32_t attr_id, switch (attr_id) { case RTE_SERVICE_LCORE_ATTR_LOOPS: - *attr_value = cs->loops; + *attr_value = lcore_attr_get_loops(lcore); return 0; default: return -EINVAL; @@ -859,11 +904,18 @@ rte_service_lcore_attr_get(uint32_t lcore, uint32_t attr_id, int32_t rte_service_attr_reset_all(uint32_t id) { - struct rte_service_spec_impl *s; - SERVICE_VALID_GET_OR_ERR_RET(id, s, -EINVAL); + unsigned int lcore; + + if (!service_valid(id)) + return -EINVAL; + + for (lcore = 0; lcore < RTE_MAX_LCORE; lcore++) { + struct core_state *cs = &lcore_states[lcore]; + + cs->calls_per_service[id] = 0; + cs->cycles_per_service[id] = 0; + } - s->cycles_spent = 0; - s->calls = 0; return 0; } @@ -885,17 +937,25 @@ rte_service_lcore_attr_reset_all(uint32_t lcore) } static void -service_dump_one(FILE *f, struct rte_service_spec_impl *s) +service_dump_one(FILE *f, uint32_t id) { + struct rte_service_spec_impl *s; + uint64_t service_calls; + uint64_t service_cycles; + + service_calls = attr_get_service_calls(id); + service_cycles = attr_get_service_cycles(id); + /* avoid divide by zero */ - int calls = 1; + if (service_calls == 0) + service_calls = 1; + + s = service_get(id); - if (s->calls != 0) - calls = s->calls; fprintf(f, " %s: stats %d\tcalls %"PRIu64"\tcycles %" PRIu64"\tavg: %"PRIu64"\n", - s->spec.name, service_stats_enabled(s), s->calls, - s->cycles_spent, s->cycles_spent / calls); + s->spec.name, service_stats_enabled(s), service_calls, + service_cycles, service_cycles / service_calls); } static void @@ -906,7 +966,7 @@ service_dump_calls_per_lcore(FILE *f, uint32_t lcore) fprintf(f, "%02d\t", lcore); for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) { - if (!service_valid(i)) + if (!service_registered(i)) continue; fprintf(f, "%"PRIu64"\t", cs->calls_per_service[i]); } @@ -924,16 +984,16 @@ rte_service_dump(FILE *f, uint32_t id) struct rte_service_spec_impl *s; SERVICE_VALID_GET_OR_ERR_RET(id, s, -EINVAL); fprintf(f, "Service %s Summary\n", s->spec.name); - service_dump_one(f, s); + service_dump_one(f, id); return 0; } /* print all services, as UINT32_MAX was passed as id */ fprintf(f, "Services Summary\n"); for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) { - if (!service_valid(i)) + if (!service_registered(i)) continue; - service_dump_one(f, &rte_services[i]); + service_dump_one(f, i); } fprintf(f, "Service Cores Summary\n"); -- 2.34.1