From: Konstantin Ananyev <konstantin.ananyev@huawei.com>
To: Tomasz Duszynski <tduszynski@marvell.com>,
Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>,
"dev@dpdk.org" <dev@dpdk.org>
Subject: RE: [EXT] Re: [PATCH v11 1/4] lib: add generic support for reading PMU events
Date: Fri, 17 Feb 2023 10:14:40 +0000 [thread overview]
Message-ID: <b00d773b3a2d4dd3a81cb67733d8a76a@huawei.com> (raw)
In-Reply-To: <DM4PR18MB43685A2D4112F65069769D09D2A19@DM4PR18MB4368.namprd18.prod.outlook.com>
> >>
> >> This is especially useful in cases where CPU cores are isolated i.e
> >> run dedicated tasks. In such cases one cannot use standard perf
> >> utility without sacrificing latency and performance.
> >>
> >> Signed-off-by: Tomasz Duszynski <tduszynski@marvell.com>
> >> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >> ---
> >> MAINTAINERS | 5 +
> >> app/test/meson.build | 2 +
> >> app/test/test_pmu.c | 62 ++++
> >> doc/api/doxy-api-index.md | 3 +-
> >> doc/api/doxy-api.conf.in | 1 +
> >> doc/guides/prog_guide/profile_app.rst | 12 +
> >> doc/guides/rel_notes/release_23_03.rst | 7 +
> >> lib/meson.build | 1 +
> >> lib/pmu/meson.build | 13 +
> >> lib/pmu/pmu_private.h | 32 ++
> >> lib/pmu/rte_pmu.c | 460 +++++++++++++++++++++++++
> >> lib/pmu/rte_pmu.h | 212 ++++++++++++
> >> lib/pmu/version.map | 15 +
> >> 13 files changed, 824 insertions(+), 1 deletion(-)
> >> create mode 100644 app/test/test_pmu.c
> >> create mode 100644 lib/pmu/meson.build
> >> create mode 100644 lib/pmu/pmu_private.h
> >> create mode 100644 lib/pmu/rte_pmu.c
> >> create mode 100644 lib/pmu/rte_pmu.h
> >> create mode 100644 lib/pmu/version.map
> >>
> >> diff --git a/MAINTAINERS b/MAINTAINERS index 3495946d0f..d37f242120
> >> 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -1697,6 +1697,11 @@ M: Nithin Dabilpuram <ndabilpuram@marvell.com>
> >> M: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >> F: lib/node/
> >>
> >> +PMU - EXPERIMENTAL
> >> +M: Tomasz Duszynski <tduszynski@marvell.com>
> >> +F: lib/pmu/
> >> +F: app/test/test_pmu*
> >> +
> >>
> >> Test Applications
> >> -----------------
> >> diff --git a/app/test/meson.build b/app/test/meson.build index
> >> f34d19e3c3..6b61b7fc32 100644
> >> --- a/app/test/meson.build
> >> +++ b/app/test/meson.build
> >> @@ -111,6 +111,7 @@ test_sources = files(
> >> 'test_reciprocal_division_perf.c',
> >> 'test_red.c',
> >> 'test_pie.c',
> >> + 'test_pmu.c',
> >> 'test_reorder.c',
> >> 'test_rib.c',
> >> 'test_rib6.c',
> >> @@ -239,6 +240,7 @@ fast_tests = [
> >> ['kni_autotest', false, true],
> >> ['kvargs_autotest', true, true],
> >> ['member_autotest', true, true],
> >> + ['pmu_autotest', true, true],
> >> ['power_cpufreq_autotest', false, true],
> >> ['power_autotest', true, true],
> >> ['power_kvm_vm_autotest', false, true], diff --git
> >> a/app/test/test_pmu.c b/app/test/test_pmu.c new file mode 100644 index
> >> 0000000000..c257638e8b
> >> --- /dev/null
> >> +++ b/app/test/test_pmu.c
> >> @@ -0,0 +1,62 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(C) 2023 Marvell International Ltd.
> >> + */
> >> +
> >> +#include "test.h"
> >> +
> >> +#ifndef RTE_EXEC_ENV_LINUX
> >> +
> >> +static int
> >> +test_pmu(void)
> >> +{
> >> + printf("pmu_autotest only supported on Linux, skipping test\n");
> >> + return TEST_SKIPPED;
> >> +}
> >> +
> >> +#else
> >> +
> >> +#include <rte_pmu.h>
> >> +
> >> +static int
> >> +test_pmu_read(void)
> >> +{
> >> + const char *name = NULL;
> >> + int tries = 10, event;
> >> + uint64_t val = 0;
> >> +
> >> + if (name == NULL) {
> >> + printf("PMU not supported on this arch\n");
> >> + return TEST_SKIPPED;
> >> + }
> >> +
> >> + if (rte_pmu_init() < 0)
> >> + return TEST_SKIPPED;
> >> +
> >> + event = rte_pmu_add_event(name);
> >> + while (tries--)
> >> + val += rte_pmu_read(event);
> >> +
> >> + rte_pmu_fini();
> >> +
> >> + return val ? TEST_SUCCESS : TEST_FAILED; }
> >> +
> >> +static struct unit_test_suite pmu_tests = {
> >> + .suite_name = "pmu autotest",
> >> + .setup = NULL,
> >> + .teardown = NULL,
> >> + .unit_test_cases = {
> >> + TEST_CASE(test_pmu_read),
> >> + TEST_CASES_END()
> >> + }
> >> +};
> >> +
> >> +static int
> >> +test_pmu(void)
> >> +{
> >> + return unit_test_suite_runner(&pmu_tests);
> >> +}
> >> +
> >> +#endif /* RTE_EXEC_ENV_LINUX */
> >> +
> >> +REGISTER_TEST_COMMAND(pmu_autotest, test_pmu);
> >> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
> >> index 2deec7ea19..a8e04a195d 100644
> >> --- a/doc/api/doxy-api-index.md
> >> +++ b/doc/api/doxy-api-index.md
> >> @@ -223,7 +223,8 @@ The public API headers are grouped by topics:
> >> [log](@ref rte_log.h),
> >> [errno](@ref rte_errno.h),
> >> [trace](@ref rte_trace.h),
> >> - [trace_point](@ref rte_trace_point.h)
> >> + [trace_point](@ref rte_trace_point.h), [pmu](@ref rte_pmu.h)
> >>
> >> - **misc**:
> >> [EAL config](@ref rte_eal.h),
> >> diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in index
> >> e859426099..350b5a8c94 100644
> >> --- a/doc/api/doxy-api.conf.in
> >> +++ b/doc/api/doxy-api.conf.in
> >> @@ -63,6 +63,7 @@ INPUT = @TOPDIR@/doc/api/doxy-api-index.md \
> >> @TOPDIR@/lib/pci \
> >> @TOPDIR@/lib/pdump \
> >> @TOPDIR@/lib/pipeline \
> >> + @TOPDIR@/lib/pmu \
> >> @TOPDIR@/lib/port \
> >> @TOPDIR@/lib/power \
> >> @TOPDIR@/lib/rawdev \ diff --git
> >> a/doc/guides/prog_guide/profile_app.rst
> >> b/doc/guides/prog_guide/profile_app.rst
> >> index 14292d4c25..89e38cd301 100644
> >> --- a/doc/guides/prog_guide/profile_app.rst
> >> +++ b/doc/guides/prog_guide/profile_app.rst
> >> @@ -7,6 +7,18 @@ Profile Your Application
> >> The following sections describe methods of profiling DPDK applications on
> >> different architectures.
> >>
> >> +Performance counter based profiling
> >> +-----------------------------------
> >> +
> >> +Majority of architectures support some performance monitoring unit (PMU).
> >> +Such unit provides programmable counters that monitor specific events.
> >> +
> >> +Different tools gather that information, like for example perf.
> >> +However, in some scenarios when CPU cores are isolated and run
> >> +dedicated tasks interrupting those tasks with perf may be undesirable.
> >> +
> >> +In such cases, an application can use the PMU library to read such events via
> >``rte_pmu_read()``.
> >> +
> >>
> >> Profiling on x86
> >> ----------------
> >> diff --git a/doc/guides/rel_notes/release_23_03.rst
> >> b/doc/guides/rel_notes/release_23_03.rst
> >> index ab998a5357..20622efe58 100644
> >> --- a/doc/guides/rel_notes/release_23_03.rst
> >> +++ b/doc/guides/rel_notes/release_23_03.rst
> >> @@ -147,6 +147,13 @@ New Features
> >> * Added support to capture packets at each graph node with packet metadata and
> >> node name.
> >>
> >> +* **Added PMU library.**
> >> +
> >> + Added a new performance monitoring unit (PMU) library which allows
> >> + applications to perform self monitoring activities without depending on external utilities
> >like perf.
> >> + After integration with :doc:`../prog_guide/trace_lib` data gathered
> >> + from hardware counters can be stored in CTF format for further analysis.
> >> +
> >>
> >> Removed Items
> >> -------------
> >> diff --git a/lib/meson.build b/lib/meson.build index
> >> 450c061d2b..8a42d45d20 100644
> >> --- a/lib/meson.build
> >> +++ b/lib/meson.build
> >> @@ -11,6 +11,7 @@
> >> libraries = [
> >> 'kvargs', # eal depends on kvargs
> >> 'telemetry', # basic info querying
> >> + 'pmu',
> >> 'eal', # everything depends on eal
> >> 'ring',
> >> 'rcu', # rcu depends on ring diff --git
> >> a/lib/pmu/meson.build b/lib/pmu/meson.build new file mode 100644 index
> >> 0000000000..a4160b494e
> >> --- /dev/null
> >> +++ b/lib/pmu/meson.build
> >> @@ -0,0 +1,13 @@
> >> +# SPDX-License-Identifier: BSD-3-Clause # Copyright(C) 2023 Marvell
> >> +International Ltd.
> >> +
> >> +if not is_linux
> >> + build = false
> >> + reason = 'only supported on Linux'
> >> + subdir_done()
> >> +endif
> >> +
> >> +includes = [global_inc]
> >> +
> >> +sources = files('rte_pmu.c')
> >> +headers = files('rte_pmu.h')
> >> diff --git a/lib/pmu/pmu_private.h b/lib/pmu/pmu_private.h new file
> >> mode 100644 index 0000000000..b9f8c1ddc8
> >> --- /dev/null
> >> +++ b/lib/pmu/pmu_private.h
> >> @@ -0,0 +1,32 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(c) 2023 Marvell
> >> + */
> >> +
> >> +#ifndef _PMU_PRIVATE_H_
> >> +#define _PMU_PRIVATE_H_
> >> +
> >> +/**
> >> + * Architecture specific PMU init callback.
> >> + *
> >> + * @return
> >> + * 0 in case of success, negative value otherwise.
> >> + */
> >> +int
> >> +pmu_arch_init(void);
> >> +
> >> +/**
> >> + * Architecture specific PMU cleanup callback.
> >> + */
> >> +void
> >> +pmu_arch_fini(void);
> >> +
> >> +/**
> >> + * Apply architecture specific settings to config before passing it to syscall.
> >> + *
> >> + * @param config
> >> + * Architecture specific event configuration. Consult kernel sources for available options.
> >> + */
> >> +void
> >> +pmu_arch_fixup_config(uint64_t config[3]);
> >> +
> >> +#endif /* _PMU_PRIVATE_H_ */
> >> diff --git a/lib/pmu/rte_pmu.c b/lib/pmu/rte_pmu.c new file mode
> >> 100644 index 0000000000..950f999cb7
> >> --- /dev/null
> >> +++ b/lib/pmu/rte_pmu.c
> >> @@ -0,0 +1,460 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(C) 2023 Marvell International Ltd.
> >> + */
> >> +
> >> +#include <ctype.h>
> >> +#include <dirent.h>
> >> +#include <errno.h>
> >> +#include <regex.h>
> >> +#include <stdlib.h>
> >> +#include <string.h>
> >> +#include <sys/ioctl.h>
> >> +#include <sys/mman.h>
> >> +#include <sys/queue.h>
> >> +#include <sys/syscall.h>
> >> +#include <unistd.h>
> >> +
> >> +#include <rte_atomic.h>
> >> +#include <rte_per_lcore.h>
> >> +#include <rte_pmu.h>
> >> +#include <rte_spinlock.h>
> >> +#include <rte_tailq.h>
> >> +
> >> +#include "pmu_private.h"
> >> +
> >> +#define EVENT_SOURCE_DEVICES_PATH "/sys/bus/event_source/devices"
> >
> >
> >I suppose that pass (as the whole implementation) is linux specific?
> >If so, wouldn't it make sense to have it under linux subdir?
> >
>
> There are not any plans to support that elsewhere currently so flat
> directory structure is good enough.
>
> >> +
> >> +#define GENMASK_ULL(h, l) ((~0ULL - (1ULL << (l)) + 1) & (~0ULL >>
> >> +((64 - 1 - (h))))) #define FIELD_PREP(m, v) (((uint64_t)(v) <<
> >> +(__builtin_ffsll(m) - 1)) & (m))
> >> +
> >> +RTE_DEFINE_PER_LCORE(struct rte_pmu_event_group, _event_group);
> >> +struct rte_pmu rte_pmu;
> >
> >Do we really need struct declaration here?
> >
>
> What’s the problem with this placement precisely?
Not a big deal, but It seems excessive for me.
As I understand you do have include just above for the whole .h
that contains definition of that struct anyway.
>
> >
> >> +/*
> >> + * Following __rte_weak functions provide default no-op.
> >> +Architectures should override them if
> >> + * necessary.
> >> + */
> >> +
> >> +int
> >> +__rte_weak pmu_arch_init(void)
> >> +{
> >> + return 0;
> >> +}
> >> +
> >> +void
> >> +__rte_weak pmu_arch_fini(void)
> >> +{
> >> +}
> >> +
> >> +void
> >> +__rte_weak pmu_arch_fixup_config(uint64_t __rte_unused config[3]) { }
> >> +
> >> +static int
> >> +get_term_format(const char *name, int *num, uint64_t *mask) {
> >> + char path[PATH_MAX];
> >> + char *config = NULL;
> >> + int high, low, ret;
> >> + FILE *fp;
> >> +
> >> + *num = *mask = 0;
> >> + snprintf(path, sizeof(path), EVENT_SOURCE_DEVICES_PATH "/%s/format/%s", rte_pmu.name, name);
> >> + fp = fopen(path, "r");
> >> + if (fp == NULL)
> >> + return -errno;
> >> +
> >> + errno = 0;
> >> + ret = fscanf(fp, "%m[^:]:%d-%d", &config, &low, &high);
> >> + if (ret < 2) {
> >> + ret = -ENODATA;
> >> + goto out;
> >> + }
> >> + if (errno) {
> >> + ret = -errno;
> >> + goto out;
> >> + }
> >> +
> >> + if (ret == 2)
> >> + high = low;
> >> +
> >> + *mask = GENMASK_ULL(high, low);
> >> + /* Last digit should be [012]. If last digit is missing 0 is implied. */
> >> + *num = config[strlen(config) - 1];
> >> + *num = isdigit(*num) ? *num - '0' : 0;
> >> +
> >> + ret = 0;
> >> +out:
> >> + free(config);
> >> + fclose(fp);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static int
> >> +parse_event(char *buf, uint64_t config[3]) {
> >> + char *token, *term;
> >> + int num, ret, val;
> >> + uint64_t mask;
> >> +
> >> + config[0] = config[1] = config[2] = 0;
> >> +
> >> + token = strtok(buf, ",");
> >> + while (token) {
> >> + errno = 0;
> >> + /* <term>=<value> */
> >> + ret = sscanf(token, "%m[^=]=%i", &term, &val);
> >> + if (ret < 1)
> >> + return -ENODATA;
> >> + if (errno)
> >> + return -errno;
> >> + if (ret == 1)
> >> + val = 1;
> >> +
> >> + ret = get_term_format(term, &num, &mask);
> >> + free(term);
> >> + if (ret)
> >> + return ret;
> >> +
> >> + config[num] |= FIELD_PREP(mask, val);
> >> + token = strtok(NULL, ",");
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +static int
> >> +get_event_config(const char *name, uint64_t config[3]) {
> >> + char path[PATH_MAX], buf[BUFSIZ];
> >> + FILE *fp;
> >> + int ret;
> >> +
> >> + snprintf(path, sizeof(path), EVENT_SOURCE_DEVICES_PATH "/%s/events/%s", rte_pmu.name, name);
> >> + fp = fopen(path, "r");
> >> + if (fp == NULL)
> >> + return -errno;
> >> +
> >> + ret = fread(buf, 1, sizeof(buf), fp);
> >> + if (ret == 0) {
> >> + fclose(fp);
> >> +
> >> + return -EINVAL;
> >> + }
> >> + fclose(fp);
> >> + buf[ret] = '\0';
> >> +
> >> + return parse_event(buf, config);
> >> +}
> >> +
> >> +static int
> >> +do_perf_event_open(uint64_t config[3], int group_fd) {
> >> + struct perf_event_attr attr = {
> >> + .size = sizeof(struct perf_event_attr),
> >> + .type = PERF_TYPE_RAW,
> >> + .exclude_kernel = 1,
> >> + .exclude_hv = 1,
> >> + .disabled = 1,
> >> + };
> >> +
> >> + pmu_arch_fixup_config(config);
> >> +
> >> + attr.config = config[0];
> >> + attr.config1 = config[1];
> >> + attr.config2 = config[2];
> >> +
> >> + return syscall(SYS_perf_event_open, &attr, 0, -1, group_fd, 0); }
> >> +
> >> +static int
> >> +open_events(struct rte_pmu_event_group *group) {
> >> + struct rte_pmu_event *event;
> >> + uint64_t config[3];
> >> + int num = 0, ret;
> >> +
> >> + /* group leader gets created first, with fd = -1 */
> >> + group->fds[0] = -1;
> >> +
> >> + TAILQ_FOREACH(event, &rte_pmu.event_list, next) {
> >> + ret = get_event_config(event->name, config);
> >> + if (ret)
> >> + continue;
> >> +
> >> + ret = do_perf_event_open(config, group->fds[0]);
> >> + if (ret == -1) {
> >> + ret = -errno;
> >> + goto out;
> >> + }
> >> +
> >> + group->fds[event->index] = ret;
> >> + num++;
> >> + }
> >> +
> >> + return 0;
> >> +out:
> >> + for (--num; num >= 0; num--) {
> >> + close(group->fds[num]);
> >> + group->fds[num] = -1;
> >> + }
> >> +
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static int
> >> +mmap_events(struct rte_pmu_event_group *group) {
> >> + long page_size = sysconf(_SC_PAGE_SIZE);
> >> + unsigned int i;
> >> + void *addr;
> >> + int ret;
> >> +
> >> + for (i = 0; i < rte_pmu.num_group_events; i++) {
> >> + addr = mmap(0, page_size, PROT_READ, MAP_SHARED, group->fds[i], 0);
> >> + if (addr == MAP_FAILED) {
> >> + ret = -errno;
> >> + goto out;
> >> + }
> >> +
> >> + group->mmap_pages[i] = addr;
> >> + if (!group->mmap_pages[i]->cap_user_rdpmc) {
> >> + ret = -EPERM;
> >> + goto out;
> >> + }
> >> + }
> >> +
> >> + return 0;
> >> +out:
> >> + for (; i; i--) {
> >> + munmap(group->mmap_pages[i - 1], page_size);
> >> + group->mmap_pages[i - 1] = NULL;
> >> + }
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static void
> >> +cleanup_events(struct rte_pmu_event_group *group) {
> >> + unsigned int i;
> >> +
> >> + if (group->fds[0] != -1)
> >> + ioctl(group->fds[0], PERF_EVENT_IOC_DISABLE, PERF_IOC_FLAG_GROUP);
> >> +
> >> + for (i = 0; i < rte_pmu.num_group_events; i++) {
> >> + if (group->mmap_pages[i]) {
> >> + munmap(group->mmap_pages[i], sysconf(_SC_PAGE_SIZE));
> >> + group->mmap_pages[i] = NULL;
> >> + }
> >> +
> >> + if (group->fds[i] != -1) {
> >> + close(group->fds[i]);
> >> + group->fds[i] = -1;
> >> + }
> >> + }
> >> +
> >> + group->enabled = false;
> >> +}
> >> +
> >> +int
> >> +__rte_pmu_enable_group(void)
> >> +{
> >> + struct rte_pmu_event_group *group = &RTE_PER_LCORE(_event_group);
> >> + int ret;
> >> +
> >> + if (rte_pmu.num_group_events == 0)
> >> + return -ENODEV;
> >> +
> >> + ret = open_events(group);
> >> + if (ret)
> >> + goto out;
> >> +
> >> + ret = mmap_events(group);
> >> + if (ret)
> >> + goto out;
> >> +
> >> + if (ioctl(group->fds[0], PERF_EVENT_IOC_RESET, PERF_IOC_FLAG_GROUP) == -1) {
> >> + ret = -errno;
> >> + goto out;
> >> + }
> >> +
> >> + if (ioctl(group->fds[0], PERF_EVENT_IOC_ENABLE, PERF_IOC_FLAG_GROUP) == -1) {
> >> + ret = -errno;
> >> + goto out;
> >> + }
> >> +
> >> + rte_spinlock_lock(&rte_pmu.lock);
> >> + TAILQ_INSERT_TAIL(&rte_pmu.event_group_list, group, next);
> >> + rte_spinlock_unlock(&rte_pmu.lock);
> >> + group->enabled = true;
> >> +
> >> + return 0;
> >> +
> >> +out:
> >> + cleanup_events(group);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static int
> >> +scan_pmus(void)
> >> +{
> >> + char path[PATH_MAX];
> >> + struct dirent *dent;
> >> + const char *name;
> >> + DIR *dirp;
> >> +
> >> + dirp = opendir(EVENT_SOURCE_DEVICES_PATH);
> >> + if (dirp == NULL)
> >> + return -errno;
> >> +
> >> + while ((dent = readdir(dirp))) {
> >> + name = dent->d_name;
> >> + if (name[0] == '.')
> >> + continue;
> >> +
> >> + /* sysfs entry should either contain cpus or be a cpu */
> >> + if (!strcmp(name, "cpu"))
> >> + break;
> >> +
> >> + snprintf(path, sizeof(path), EVENT_SOURCE_DEVICES_PATH "/%s/cpus", name);
> >> + if (access(path, F_OK) == 0)
> >> + break;
> >> + }
> >> +
> >> + if (dent) {
> >> + rte_pmu.name = strdup(name);
> >> + if (rte_pmu.name == NULL) {
> >> + closedir(dirp);
> >> +
> >> + return -ENOMEM;
> >> + }
> >> + }
> >> +
> >> + closedir(dirp);
> >> +
> >> + return rte_pmu.name ? 0 : -ENODEV;
> >> +}
> >> +
> >> +static struct rte_pmu_event *
> >> +new_event(const char *name)
> >> +{
> >> + struct rte_pmu_event *event;
> >> +
> >> + event = calloc(1, sizeof(*event));
> >> + if (event == NULL)
> >> + goto out;
> >> +
> >> + event->name = strdup(name);
> >> + if (event->name == NULL) {
> >> + free(event);
> >> + event = NULL;
> >> + }
> >> +
> >> +out:
> >> + return event;
> >> +}
> >> +
> >> +static void
> >> +free_event(struct rte_pmu_event *event) {
> >> + free(event->name);
> >> + free(event);
> >> +}
> >> +
> >> +int
> >> +rte_pmu_add_event(const char *name)
> >> +{
> >> + struct rte_pmu_event *event;
> >> + char path[PATH_MAX];
> >> +
> >> + if (rte_pmu.name == NULL)
> >> + return -ENODEV;
> >> +
> >> + if (rte_pmu.num_group_events + 1 >= MAX_NUM_GROUP_EVENTS)
> >> + return -ENOSPC;
> >> +
> >> + snprintf(path, sizeof(path), EVENT_SOURCE_DEVICES_PATH "/%s/events/%s", rte_pmu.name, name);
> >> + if (access(path, R_OK))
> >> + return -ENODEV;
> >> +
> >> + TAILQ_FOREACH(event, &rte_pmu.event_list, next) {
> >> + if (!strcmp(event->name, name))
> >> + return event->index;
> >> + continue;
> >> + }
> >> +
> >> + event = new_event(name);
> >> + if (event == NULL)
> >> + return -ENOMEM;
> >> +
> >> + event->index = rte_pmu.num_group_events++;
> >> + TAILQ_INSERT_TAIL(&rte_pmu.event_list, event, next);
> >> +
> >> + return event->index;
> >> +}
> >> +
> >> +int
> >> +rte_pmu_init(void)
> >> +{
> >> + int ret;
> >> +
> >> + /* Allow calling init from multiple contexts within a single thread. This simplifies
> >> + * resource management a bit e.g in case fast-path tracepoint has already been enabled
> >> + * via command line but application doesn't care enough and performs init/fini again.
> >> + */
> >> + if (rte_pmu.initialized != 0) {
> >> + rte_pmu.initialized++;
> >> + return 0;
> >> + }
> >> +
> >> + ret = scan_pmus();
> >> + if (ret)
> >> + goto out;
> >> +
> >> + ret = pmu_arch_init();
> >> + if (ret)
> >> + goto out;
> >> +
> >> + TAILQ_INIT(&rte_pmu.event_list);
> >> + TAILQ_INIT(&rte_pmu.event_group_list);
> >> + rte_spinlock_init(&rte_pmu.lock);
> >> + rte_pmu.initialized = 1;
> >> +
> >> + return 0;
> >> +out:
> >> + free(rte_pmu.name);
> >> + rte_pmu.name = NULL;
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +void
> >> +rte_pmu_fini(void)
> >> +{
> >> + struct rte_pmu_event_group *group, *tmp_group;
> >> + struct rte_pmu_event *event, *tmp_event;
> >> +
> >> + /* cleanup once init count drops to zero */
> >> + if (rte_pmu.initialized == 0 || --rte_pmu.initialized != 0)
> >> + return;
> >> +
> >> + RTE_TAILQ_FOREACH_SAFE(event, &rte_pmu.event_list, next, tmp_event) {
> >> + TAILQ_REMOVE(&rte_pmu.event_list, event, next);
> >> + free_event(event);
> >> + }
> >> +
> >> + RTE_TAILQ_FOREACH_SAFE(group, &rte_pmu.event_group_list, next, tmp_group) {
> >> + TAILQ_REMOVE(&rte_pmu.event_group_list, group, next);
> >> + cleanup_events(group);
> >> + }
> >> +
> >> + pmu_arch_fini();
> >> + free(rte_pmu.name);
> >> + rte_pmu.name = NULL;
> >> + rte_pmu.num_group_events = 0;
> >> +}
> >> diff --git a/lib/pmu/rte_pmu.h b/lib/pmu/rte_pmu.h new file mode
> >> 100644 index 0000000000..6b664c3336
> >> --- /dev/null
> >> +++ b/lib/pmu/rte_pmu.h
> >> @@ -0,0 +1,212 @@
> >> +/* SPDX-License-Identifier: BSD-3-Clause
> >> + * Copyright(c) 2023 Marvell
> >> + */
> >> +
> >> +#ifndef _RTE_PMU_H_
> >> +#define _RTE_PMU_H_
> >> +
> >> +/**
> >> + * @file
> >> + *
> >> + * PMU event tracing operations
> >> + *
> >> + * This file defines generic API and types necessary to setup PMU and
> >> + * read selected counters in runtime.
> >> + */
> >> +
> >> +#ifdef __cplusplus
> >> +extern "C" {
> >> +#endif
> >> +
> >> +#include <linux/perf_event.h>
> >> +
> >> +#include <rte_atomic.h>
> >> +#include <rte_branch_prediction.h>
> >> +#include <rte_common.h>
> >> +#include <rte_compat.h>
> >> +#include <rte_spinlock.h>
> >> +
> >> +/** Maximum number of events in a group */ #define
> >> +MAX_NUM_GROUP_EVENTS 8
> >> +
> >> +/**
> >> + * A structure describing a group of events.
> >> + */
> >> +struct rte_pmu_event_group {
> >> + struct perf_event_mmap_page *mmap_pages[MAX_NUM_GROUP_EVENTS]; /**< array of user pages */
> >> + int fds[MAX_NUM_GROUP_EVENTS]; /**< array of event descriptors */
> >> + bool enabled; /**< true if group was enabled on particular lcore */
> >> + TAILQ_ENTRY(rte_pmu_event_group) next; /**< list entry */ }
> >> +__rte_cache_aligned;
> >> +
> >> +/**
> >> + * A structure describing an event.
> >> + */
> >> +struct rte_pmu_event {
> >> + char *name; /**< name of an event */
> >> + unsigned int index; /**< event index into fds/mmap_pages */
> >> + TAILQ_ENTRY(rte_pmu_event) next; /**< list entry */ };
> >> +
> >> +/**
> >> + * A PMU state container.
> >> + */
> >> +struct rte_pmu {
> >> + char *name; /**< name of core PMU listed under /sys/bus/event_source/devices */
> >> + rte_spinlock_t lock; /**< serialize access to event group list */
> >> + TAILQ_HEAD(, rte_pmu_event_group) event_group_list; /**< list of event groups */
> >> + unsigned int num_group_events; /**< number of events in a group */
> >> + TAILQ_HEAD(, rte_pmu_event) event_list; /**< list of matching events */
> >> + unsigned int initialized; /**< initialization counter */ };
> >> +
> >> +/** lcore event group */
> >> +RTE_DECLARE_PER_LCORE(struct rte_pmu_event_group, _event_group);
> >> +
> >> +/** PMU state container */
> >> +extern struct rte_pmu rte_pmu;
> >> +
> >> +/** Each architecture supporting PMU needs to provide its own version
> >> +*/ #ifndef rte_pmu_pmc_read #define rte_pmu_pmc_read(index) ({ 0; })
> >> +#endif
> >> +
> >> +/**
> >> + * @warning
> >> + * @b EXPERIMENTAL: this API may change without prior notice
> >> + *
> >> + * Read PMU counter.
> >> + *
> >> + * @warning This should be not called directly.
> >> + *
> >> + * @param pc
> >> + * Pointer to the mmapped user page.
> >> + * @return
> >> + * Counter value read from hardware.
> >> + */
> >> +static __rte_always_inline uint64_t
> >> +__rte_pmu_read_userpage(struct perf_event_mmap_page *pc) {
> >> + uint64_t width, offset;
> >> + uint32_t seq, index;
> >> + int64_t pmc;
> >> +
> >> + for (;;) {
> >> + seq = pc->lock;
> >> + rte_compiler_barrier();
> >
> >Are you sure that compiler_barrier() is enough here?
> >On some archs CPU itself has freedom to re-order reads.
> >Or I am missing something obvious here?
> >
>
> It's a matter of not keeping old stuff cached in registers
> and making sure that we have two reads of lock. CPU reordering
> won't do any harm here.
Sorry, I didn't get you here:
Suppose CPU will re-order reads and will read lock *after* index or offset value.
Wouldn't it mean that in that case index and/or offset can contain old/invalid values?
>
> >> + index = pc->index;
> >> + offset = pc->offset;
> >> + width = pc->pmc_width;
> >> +
> >> + /* index set to 0 means that particular counter cannot be used */
> >> + if (likely(pc->cap_user_rdpmc && index)) {
> >> + pmc = rte_pmu_pmc_read(index - 1);
> >> + pmc <<= 64 - width;
> >> + pmc >>= 64 - width;
> >> + offset += pmc;
> >> + }
> >> +
> >> + rte_compiler_barrier();
> >> +
> >> + if (likely(pc->lock == seq))
> >> + return offset;
> >> + }
> >> +
> >> + return 0;
> >> +}
> >> +
> >> +/**
> >> + * @warning
> >> + * @b EXPERIMENTAL: this API may change without prior notice
> >> + *
> >> + * Enable group of events on the calling lcore.
> >> + *
> >> + * @warning This should be not called directly.
> >> + *
> >> + * @return
> >> + * 0 in case of success, negative value otherwise.
> >> + */
> >> +__rte_experimental
> >> +int
> >> +__rte_pmu_enable_group(void);
> >> +
> >> +/**
> >> + * @warning
> >> + * @b EXPERIMENTAL: this API may change without prior notice
> >> + *
> >> + * Initialize PMU library.
> >> + *
> >> + * @warning This should be not called directly.
> >> + *
> >> + * @return
> >> + * 0 in case of success, negative value otherwise.
> >> + */
> >> +__rte_experimental
> >> +int
> >> +rte_pmu_init(void);
> >> +
> >> +/**
> >> + * @warning
> >> + * @b EXPERIMENTAL: this API may change without prior notice
> >> + *
> >> + * Finalize PMU library. This should be called after PMU counters are no longer being read.
> >> + */
> >> +__rte_experimental
> >> +void
> >> +rte_pmu_fini(void);
> >> +
> >> +/**
> >> + * @warning
> >> + * @b EXPERIMENTAL: this API may change without prior notice
> >> + *
> >> + * Add event to the group of enabled events.
> >> + *
> >> + * @param name
> >> + * Name of an event listed under /sys/bus/event_source/devices/pmu/events.
> >> + * @return
> >> + * Event index in case of success, negative value otherwise.
> >> + */
> >> +__rte_experimental
> >> +int
> >> +rte_pmu_add_event(const char *name);
> >> +
> >> +/**
> >> + * @warning
> >> + * @b EXPERIMENTAL: this API may change without prior notice
> >> + *
> >> + * Read hardware counter configured to count occurrences of an event.
> >> + *
> >> + * @param index
> >> + * Index of an event to be read.
> >> + * @return
> >> + * Event value read from register. In case of errors or lack of support
> >> + * 0 is returned. In other words, stream of zeros in a trace file
> >> + * indicates problem with reading particular PMU event register.
> >> + */
> >> +__rte_experimental
> >> +static __rte_always_inline uint64_t
> >> +rte_pmu_read(unsigned int index)
> >> +{
> >> + struct rte_pmu_event_group *group = &RTE_PER_LCORE(_event_group);
> >> + int ret;
> >> +
> >> + if (unlikely(!rte_pmu.initialized))
> >> + return 0;
> >> +
> >> + if (unlikely(!group->enabled)) {
> >> + ret = __rte_pmu_enable_group();
> >> + if (ret)
> >> + return 0;
> >> + }
> >> +
> >> + if (unlikely(index >= rte_pmu.num_group_events))
> >> + return 0;
> >> +
> >> + return __rte_pmu_read_userpage(group->mmap_pages[index]);
> >> +}
> >> +
> >> +#ifdef __cplusplus
> >> +}
> >> +#endif
> >> +
> >> +#endif /* _RTE_PMU_H_ */
> >> diff --git a/lib/pmu/version.map b/lib/pmu/version.map new file mode
> >> 100644 index 0000000000..39a4f279c1
> >> --- /dev/null
> >> +++ b/lib/pmu/version.map
> >> @@ -0,0 +1,15 @@
> >> +DPDK_23 {
> >> + local: *;
> >> +};
> >> +
> >> +EXPERIMENTAL {
> >> + global:
> >> +
> >> + __rte_pmu_enable_group;
> >> + per_lcore__event_group;
> >> + rte_pmu;
> >> + rte_pmu_add_event;
> >> + rte_pmu_fini;
> >> + rte_pmu_init;
> >> + rte_pmu_read;
> >> +};
next prev parent reply other threads:[~2023-02-17 10:14 UTC|newest]
Thread overview: 139+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-11 9:43 [PATCH 0/4] add support for self monitoring Tomasz Duszynski
2022-11-11 9:43 ` [PATCH 1/4] eal: add generic support for reading PMU events Tomasz Duszynski
2022-12-15 8:33 ` Mattias Rönnblom
2022-11-11 9:43 ` [PATCH 2/4] eal/arm: support reading ARM PMU events in runtime Tomasz Duszynski
2022-11-11 9:43 ` [PATCH 3/4] eal/x86: support reading Intel " Tomasz Duszynski
2022-11-11 9:43 ` [PATCH 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2022-11-21 12:11 ` [PATCH v2 0/4] add support for self monitoring Tomasz Duszynski
2022-11-21 12:11 ` [PATCH v2 1/4] eal: add generic support for reading PMU events Tomasz Duszynski
2022-11-21 12:11 ` [PATCH v2 2/4] eal/arm: support reading ARM PMU events in runtime Tomasz Duszynski
2022-11-21 12:11 ` [PATCH v2 3/4] eal/x86: support reading Intel " Tomasz Duszynski
2022-11-21 12:11 ` [PATCH v2 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2022-11-29 9:28 ` [PATCH v3 0/4] add support for self monitoring Tomasz Duszynski
2022-11-29 9:28 ` [PATCH v3 1/4] eal: add generic support for reading PMU events Tomasz Duszynski
2022-11-30 8:32 ` zhoumin
2022-12-13 8:05 ` [EXT] " Tomasz Duszynski
2022-11-29 9:28 ` [PATCH v3 2/4] eal/arm: support reading ARM PMU events in runtime Tomasz Duszynski
2022-11-29 9:28 ` [PATCH v3 3/4] eal/x86: support reading Intel " Tomasz Duszynski
2022-11-29 9:28 ` [PATCH v3 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2022-11-29 10:42 ` [PATCH v3 0/4] add support for self monitoring Morten Brørup
2022-12-13 8:23 ` Tomasz Duszynski
2022-12-13 10:43 ` [PATCH v4 " Tomasz Duszynski
2022-12-13 10:43 ` [PATCH v4 1/4] eal: add generic support for reading PMU events Tomasz Duszynski
2022-12-13 11:52 ` Morten Brørup
2022-12-14 9:38 ` Tomasz Duszynski
2022-12-14 10:41 ` Morten Brørup
2022-12-15 8:22 ` Morten Brørup
2022-12-16 7:33 ` Morten Brørup
2023-01-05 21:14 ` Tomasz Duszynski
2023-01-05 22:07 ` Morten Brørup
2023-01-08 15:41 ` Tomasz Duszynski
2023-01-08 16:30 ` Morten Brørup
2022-12-15 8:46 ` Mattias Rönnblom
2023-01-04 15:47 ` Tomasz Duszynski
2023-01-09 7:37 ` Ruifeng Wang
2023-01-09 15:40 ` Tomasz Duszynski
2022-12-13 10:43 ` [PATCH v4 2/4] eal/arm: support reading ARM PMU events in runtime Tomasz Duszynski
2022-12-13 10:43 ` [PATCH v4 3/4] eal/x86: support reading Intel " Tomasz Duszynski
2022-12-13 10:43 ` [PATCH v4 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2023-01-10 23:46 ` [PATCH v5 0/4] add support for self monitoring Tomasz Duszynski
2023-01-10 23:46 ` [PATCH v5 1/4] eal: add generic support for reading PMU events Tomasz Duszynski
2023-01-11 9:05 ` Morten Brørup
2023-01-11 16:20 ` Tomasz Duszynski
2023-01-11 16:54 ` Morten Brørup
2023-01-10 23:46 ` [PATCH v5 2/4] eal/arm: support reading ARM PMU events in runtime Tomasz Duszynski
2023-01-10 23:46 ` [PATCH v5 3/4] eal/x86: support reading Intel " Tomasz Duszynski
2023-01-10 23:46 ` [PATCH v5 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2023-01-11 0:32 ` [PATCH v5 0/4] add support for self monitoring Tyler Retzlaff
2023-01-11 9:31 ` Morten Brørup
2023-01-11 14:24 ` Tomasz Duszynski
2023-01-11 14:32 ` Bruce Richardson
2023-01-11 9:39 ` [EXT] " Tomasz Duszynski
2023-01-11 21:05 ` Tyler Retzlaff
2023-01-13 7:44 ` Tomasz Duszynski
2023-01-13 19:22 ` Tyler Retzlaff
2023-01-14 9:53 ` Morten Brørup
2023-01-19 23:39 ` [PATCH v6 " Tomasz Duszynski
2023-01-19 23:39 ` [PATCH v6 1/4] lib: add generic support for reading PMU events Tomasz Duszynski
2023-01-20 9:46 ` Morten Brørup
2023-01-26 9:40 ` Tomasz Duszynski
2023-01-26 12:29 ` Morten Brørup
2023-01-26 12:59 ` Bruce Richardson
2023-01-26 15:28 ` [EXT] " Tomasz Duszynski
2023-02-02 14:27 ` Morten Brørup
2023-01-26 15:17 ` Tomasz Duszynski
2023-01-20 18:29 ` Tyler Retzlaff
2023-01-26 9:05 ` [EXT] " Tomasz Duszynski
2023-01-19 23:39 ` [PATCH v6 2/4] pmu: support reading ARM PMU events in runtime Tomasz Duszynski
2023-01-19 23:39 ` [PATCH v6 3/4] pmu: support reading Intel x86_64 " Tomasz Duszynski
2023-01-19 23:39 ` [PATCH v6 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2023-02-01 13:17 ` [PATCH v7 0/4] add support for self monitoring Tomasz Duszynski
2023-02-01 13:17 ` [PATCH v7 1/4] lib: add generic support for reading PMU events Tomasz Duszynski
2023-02-01 13:17 ` [PATCH v7 2/4] pmu: support reading ARM PMU events in runtime Tomasz Duszynski
2023-02-01 13:17 ` [PATCH v7 3/4] pmu: support reading Intel x86_64 " Tomasz Duszynski
2023-02-01 13:17 ` [PATCH v7 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2023-02-01 13:51 ` [PATCH v7 0/4] add support for self monitoring Morten Brørup
2023-02-02 7:54 ` Tomasz Duszynski
2023-02-02 9:43 ` [PATCH v8 " Tomasz Duszynski
2023-02-02 9:43 ` [PATCH v8 1/4] lib: add generic support for reading PMU events Tomasz Duszynski
2023-02-02 10:32 ` Ruifeng Wang
2023-02-02 9:43 ` [PATCH v8 2/4] pmu: support reading ARM PMU events in runtime Tomasz Duszynski
2023-02-02 9:43 ` [PATCH v8 3/4] pmu: support reading Intel x86_64 " Tomasz Duszynski
2023-02-02 9:43 ` [PATCH v8 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2023-02-02 12:49 ` [PATCH v9 0/4] add support for self monitoring Tomasz Duszynski
2023-02-02 12:49 ` [PATCH v9 1/4] lib: add generic support for reading PMU events Tomasz Duszynski
2023-02-06 11:02 ` David Marchand
2023-02-09 11:09 ` [EXT] " Tomasz Duszynski
2023-02-02 12:49 ` [PATCH v9 2/4] pmu: support reading ARM PMU events in runtime Tomasz Duszynski
2023-02-02 12:49 ` [PATCH v9 3/4] pmu: support reading Intel x86_64 " Tomasz Duszynski
2023-02-02 12:49 ` [PATCH v9 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2023-02-13 11:31 ` [PATCH v10 0/4] add support for self monitoring Tomasz Duszynski
2023-02-13 11:31 ` [PATCH v10 1/4] lib: add generic support for reading PMU events Tomasz Duszynski
2023-02-16 7:39 ` Ruifeng Wang
2023-02-16 14:44 ` Tomasz Duszynski
2023-02-13 11:31 ` [PATCH v10 2/4] pmu: support reading ARM PMU events in runtime Tomasz Duszynski
2023-02-16 7:41 ` Ruifeng Wang
2023-02-13 11:31 ` [PATCH v10 3/4] pmu: support reading Intel x86_64 " Tomasz Duszynski
2023-02-13 11:31 ` [PATCH v10 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2023-02-16 17:54 ` [PATCH v11 0/4] add support for self monitoring Tomasz Duszynski
2023-02-16 17:54 ` [PATCH v11 1/4] lib: add generic support for reading PMU events Tomasz Duszynski
2023-02-16 23:50 ` Konstantin Ananyev
2023-02-17 8:49 ` [EXT] " Tomasz Duszynski
2023-02-17 10:14 ` Konstantin Ananyev [this message]
2023-02-19 14:23 ` Tomasz Duszynski
2023-02-20 14:31 ` Konstantin Ananyev
2023-02-20 16:59 ` Tomasz Duszynski
2023-02-20 17:21 ` Konstantin Ananyev
2023-02-20 20:42 ` Tomasz Duszynski
2023-02-21 0:48 ` Konstantin Ananyev
2023-02-27 8:12 ` Tomasz Duszynski
2023-02-28 11:35 ` Konstantin Ananyev
2023-02-21 12:15 ` Konstantin Ananyev
2023-02-21 2:17 ` Konstantin Ananyev
2023-02-27 9:19 ` [EXT] " Tomasz Duszynski
2023-02-27 20:53 ` Konstantin Ananyev
2023-02-28 8:25 ` Morten Brørup
2023-02-28 12:04 ` Konstantin Ananyev
2023-02-28 13:15 ` Morten Brørup
2023-02-28 16:22 ` Morten Brørup
2023-03-05 16:30 ` Konstantin Ananyev
2023-02-28 9:57 ` Tomasz Duszynski
2023-02-28 11:58 ` Konstantin Ananyev
2023-02-16 17:55 ` [PATCH v11 2/4] pmu: support reading ARM PMU events in runtime Tomasz Duszynski
2023-02-16 17:55 ` [PATCH v11 3/4] pmu: support reading Intel x86_64 " Tomasz Duszynski
2023-02-16 17:55 ` [PATCH v11 4/4] eal: add PMU support to tracing library Tomasz Duszynski
2023-02-16 18:03 ` [PATCH v11 0/4] add support for self monitoring Ruifeng Wang
2023-05-04 8:02 ` David Marchand
2023-07-31 12:33 ` Thomas Monjalon
2023-08-07 8:11 ` [EXT] " Tomasz Duszynski
2023-09-21 8:26 ` David Marchand
2023-01-25 10:33 ` [PATCH 0/2] add platform bus Tomasz Duszynski
2023-01-25 10:33 ` [PATCH 1/2] lib: add helper to read strings from sysfs files Tomasz Duszynski
2023-01-25 10:39 ` Thomas Monjalon
2023-01-25 16:16 ` Tyler Retzlaff
2023-01-26 8:30 ` [EXT] " Tomasz Duszynski
2023-01-26 17:21 ` Tyler Retzlaff
2023-01-26 8:35 ` Tomasz Duszynski
2023-01-25 10:33 ` [PATCH 2/2] bus: add platform bus Tomasz Duszynski
2023-01-25 10:41 ` [PATCH 0/2] " Tomasz Duszynski
2023-02-16 20:56 ` [PATCH v5 0/4] add support for self monitoring Liang Ma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b00d773b3a2d4dd3a81cb67733d8a76a@huawei.com \
--to=konstantin.ananyev@huawei.com \
--cc=dev@dpdk.org \
--cc=konstantin.v.ananyev@yandex.ru \
--cc=tduszynski@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).