From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 135AA43DD2; Mon, 8 Apr 2024 23:26:20 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 022994067A; Mon, 8 Apr 2024 23:26:20 +0200 (CEST) Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) by mails.dpdk.org (Postfix) with ESMTP id D29724064A for ; Mon, 8 Apr 2024 23:26:18 +0200 (CEST) Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6ed04c91c46so3151705b3a.0 for ; Mon, 08 Apr 2024 14:26:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1712611578; x=1713216378; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=e8EM7gia0ZiCANGv+rhleXV1EfI5Fa+g6j/LmqjQQXo=; b=md1nK73kQKv0l/JxY9kfsPiLcyAGs6veU4+yQx+FmA9O1Hmuy1MlJk59AFVNjKDFpe 0rhBfhyctI9aLEnx9oDd35vHTIT1oQ9IYva6vcQueGw84aguj5JwaKpy5IaUZg9AVcoY W51QQ7So/Ypbc+VzZtherWEXosJCfsb0RK5luWq+TjCKyzeLHi/1ake4nut8DvYdc5gE o3VUCnXVKV2qqMcIInZShFXpXJ1qug3zY3NmNqnleU7EpAzGSEnDgBxNOKZLqrw+Ownp au2qSwe8hUxeMPUAHJlakeLjid+uNts9/FPr+F4BPPyqsxwOLgAI5jD6yFmJTZ2R7P6k vARw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712611578; x=1713216378; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=e8EM7gia0ZiCANGv+rhleXV1EfI5Fa+g6j/LmqjQQXo=; b=fzPuBRph2nnxw1AQKLrKJoahXCepd0ydI6IRyerkZzRA6H9nAUVkZrxnBHW17X9YJH HHZj3WYcJz09HrmJacBnl3uD61RSUEHe77QWONkRcprXpFbXT7Z4f2zmNc/jDxQf2Kld tAQpnsTzMC6in6wvLAYXIxGUL/26ru5vL17PxXyAWsH04YUcQSHv+ZQo3Q8MdC7Vf8LC hYe7Si46lxr2Rok2aiesoeM9sOSwG8/tpUf8p/mmKICjFj78kKusFnB+XrWGYFmlTMO/ hVnKWN/gNfhsq/C2g5u/sstptJPIROcZDwuPcZSV5J0zsfMzHJxqu8esvZRDoej/SUWH AWfg== X-Gm-Message-State: AOJu0YyapvsI8XdJh8ohKeXmaqN6UJDZSutw3u4M5jZ5x4sCswR1YM9M seE/rEjTLG/KuzkmZLEDlm0eV+EHgLH79ZURja6VpdUtozCcCYA02Xj2u6hrfjWdKzurq7D8gIz z94s= X-Google-Smtp-Source: AGHT+IEuwWhH0RlqJzpxf/Qj6H3nRwvoEJ43WsJD4SJXsLfcEFvItg4WrDqF1s9EAixzDIkH1DZN5g== X-Received: by 2002:a05:6a20:9194:b0:1a7:4b33:7c6f with SMTP id v20-20020a056a20919400b001a74b337c6fmr9537344pzd.43.1712611577911; Mon, 08 Apr 2024 14:26:17 -0700 (PDT) Received: from hermes.lan (204-195-96-226.wavecable.com. [204.195.96.226]) by smtp.gmail.com with ESMTPSA id x6-20020aa79a46000000b006eceaccaec9sm6947667pfj.131.2024.04.08.14.26.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Apr 2024 14:26:17 -0700 (PDT) From: Stephen Hemminger To: dev@dpdk.org Cc: Stephen Hemminger , Reshma Pattan Subject: [PATCH v2] latencystats: performance overhaul Date: Mon, 8 Apr 2024 14:26:15 -0700 Message-ID: <20240408212615.416813-1-stephen@networkplumber.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240408195036.182545-1-stephen@networkplumber.org> References: <20240408195036.182545-1-stephen@networkplumber.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The latencystats library had multiple performance issues. - using floating point in the fast path of Tx - global lock in fast path - global stats cause cache thrashing - divide instructions in fast path Other minor issues: - message split across lines - error messages not printed with correct priority Signed-off-by: Stephen Hemminger --- lib/latencystats/rte_latencystats.c | 334 ++++++++++++++++------------ 1 file changed, 187 insertions(+), 147 deletions(-) diff --git a/lib/latencystats/rte_latencystats.c b/lib/latencystats/rte_latencystats.c index 4ea9b0d75b..5154e0650a 100644 --- a/lib/latencystats/rte_latencystats.c +++ b/lib/latencystats/rte_latencystats.c @@ -2,29 +2,18 @@ * Copyright(c) 2018 Intel Corporation */ -#include - -#include #include #include +#include #include #include #include #include #include +#include #include "rte_latencystats.h" -/** Nano seconds per second */ -#define NS_PER_SEC 1E9 - -/** Clock cycles per nano second */ -static uint64_t -latencystat_cycles_per_ns(void) -{ - return rte_get_timer_hz() / NS_PER_SEC; -} - RTE_LOG_REGISTER_DEFAULT(latencystat_logtype, INFO); #define RTE_LOGTYPE_LATENCY_STATS latencystat_logtype #define LATENCY_STATS_LOG(level, ...) \ @@ -40,28 +29,32 @@ timestamp_dynfield(struct rte_mbuf *mbuf) timestamp_dynfield_offset, rte_mbuf_timestamp_t *); } -static const char *MZ_RTE_LATENCY_STATS = "rte_latencystats"; +static const char MZ_RTE_LATENCY_STATS[] = "rte_latencystats"; static int latency_stats_index; static uint64_t samp_intvl; -static uint64_t timer_tsc; -static uint64_t prev_tsc; +/* Per queue latency information (in cycles) */ struct rte_latency_stats { - float min_latency; /**< Minimum latency in nano seconds */ - float avg_latency; /**< Average latency in nano seconds */ - float max_latency; /**< Maximum latency in nano seconds */ - float jitter; /** Latency variation */ - rte_spinlock_t lock; /** Latency calculation lock */ -}; - -static struct rte_latency_stats *glob_stats; - -struct rxtx_cbs { + RTE_ATOMIC(uint64_t) min_latency; /* Minimum latency */ + RTE_ATOMIC(uint64_t) avg_latency; /* Average latency */ + RTE_ATOMIC(uint64_t) max_latency; /* Maximum latency */ + RTE_ATOMIC(uint64_t) jitter; /* Latency variation */ +} __rte_cache_aligned; + +/* per queue info stored in memxone */ +static struct { + struct rte_latency_stats stats[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; +} *latency_stats; + +static struct { + uint64_t prev_tsc; const struct rte_eth_rxtx_callback *cb; -}; +} rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; -static struct rxtx_cbs rx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; -static struct rxtx_cbs tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; +static struct { + uint64_t prev_latency; + const struct rte_eth_rxtx_callback *cb; +} tx_cbs[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; struct latency_stats_nameoff { char name[RTE_ETH_XSTATS_NAME_SIZE]; @@ -75,51 +68,92 @@ static const struct latency_stats_nameoff lat_stats_strings[] = { {"jitter_ns", offsetof(struct rte_latency_stats, jitter)}, }; -#define NUM_LATENCY_STATS (sizeof(lat_stats_strings) / \ - sizeof(lat_stats_strings[0])) +#define NUM_LATENCY_STATS RTE_DIM(lat_stats_strings) -int32_t -rte_latencystats_update(void) +static inline uint64_t +cycles_to_ns(uint64_t cycles) { - unsigned int i; - float *stats_ptr = NULL; - uint64_t values[NUM_LATENCY_STATS] = {0}; - int ret; - - for (i = 0; i < NUM_LATENCY_STATS; i++) { - stats_ptr = RTE_PTR_ADD(glob_stats, - lat_stats_strings[i].offset); - values[i] = (uint64_t)floor((*stats_ptr)/ - latencystat_cycles_per_ns()); - } + return (cycles * NSEC_PER_SEC) / rte_get_tsc_hz(); +} - ret = rte_metrics_update_values(RTE_METRICS_GLOBAL, - latency_stats_index, - values, NUM_LATENCY_STATS); - if (ret < 0) - LATENCY_STATS_LOG(INFO, "Failed to push the stats"); +static inline uint64_t +latencystat_read_ns(__rte_atomic const uint64_t *stat_ptr) +{ + return cycles_to_ns(rte_atomic_load_explicit(stat_ptr, rte_memory_order_relaxed)); +} - return ret; +static inline void +latencystat_write(__rte_atomic uint64_t *stat_ptr, uint64_t value) +{ + rte_atomic_store_explicit(stat_ptr, value, rte_memory_order_relaxed); } +/* aggregate data across all ports and queues */ static void -rte_latencystats_fill_values(struct rte_metric_value *values) +latencystats_collect(uint64_t *values) { - unsigned int i; - float *stats_ptr = NULL; + unsigned int i, samples = 0; + uint16_t pid, qid; + int ret; + struct rte_latency_stats sum = { }; + + RTE_ETH_FOREACH_DEV(pid) { + struct rte_eth_dev_info dev_info; + + ret = rte_eth_dev_info_get(pid, &dev_info); + if (ret != 0) { + LATENCY_STATS_LOG(ERR, + "Error during getting device (port %u) info: %s", + pid, strerror(-ret)); + continue; + } + + for (qid = 0; qid < dev_info.nb_tx_queues; qid++) { + const struct rte_latency_stats *stats = &latency_stats->stats[pid][qid]; + uint64_t l; + + l = latencystat_read_ns(&stats->min_latency); + if (l != 0 && (sum.min_latency == 0 || l < sum.min_latency)) + sum.min_latency = l; + + l = latencystat_read_ns(&stats->max_latency); + if (l < sum.max_latency) + sum.max_latency = l; + + sum.avg_latency += latencystat_read_ns(&stats->avg_latency); + sum.jitter += latencystat_read_ns(&stats->jitter); + ++samples; + } + + } + /* adjust averages based on number of samples */ + if (samples > 0) { + sum.avg_latency /= samples; + sum.jitter /= samples; + } + + /* convert cycle counts to ns */ for (i = 0; i < NUM_LATENCY_STATS; i++) { - stats_ptr = RTE_PTR_ADD(glob_stats, - lat_stats_strings[i].offset); - values[i].key = i; - values[i].value = (uint64_t)floor((*stats_ptr)/ - latencystat_cycles_per_ns()); + uint64_t *stats_ptr = RTE_PTR_ADD(&sum, lat_stats_strings[i].offset); + + values[i] = *stats_ptr; } } +int32_t +rte_latencystats_update(void) +{ + uint64_t values[NUM_LATENCY_STATS] = { 0 }; + + latencystats_collect(values); + + return rte_metrics_update_values(RTE_METRICS_GLOBAL, latency_stats_index, + values, NUM_LATENCY_STATS); +} + static uint16_t -add_time_stamps(uint16_t pid __rte_unused, - uint16_t qid __rte_unused, +add_time_stamps(uint16_t pid, uint16_t qid, struct rte_mbuf **pkts, uint16_t nb_pkts, uint16_t max_pkts __rte_unused, @@ -127,55 +161,49 @@ add_time_stamps(uint16_t pid __rte_unused, { unsigned int i; uint64_t diff_tsc, now; + uint64_t *prev_tsc = &rx_cbs[pid][qid].prev_tsc; /* * For every sample interval, * time stamp is marked on one received packet. */ - now = rte_rdtsc(); for (i = 0; i < nb_pkts; i++) { - diff_tsc = now - prev_tsc; - timer_tsc += diff_tsc; + if ((pkts[i]->ol_flags & timestamp_dynflag) != 0) + continue; - if ((pkts[i]->ol_flags & timestamp_dynflag) == 0 - && (timer_tsc >= samp_intvl)) { + now = rte_rdtsc(); + diff_tsc = now - *prev_tsc; + if (diff_tsc >= samp_intvl) { *timestamp_dynfield(pkts[i]) = now; pkts[i]->ol_flags |= timestamp_dynflag; - timer_tsc = 0; + *prev_tsc = now; + break; } - prev_tsc = now; - now = rte_rdtsc(); } return nb_pkts; } static uint16_t -calc_latency(uint16_t pid __rte_unused, - uint16_t qid __rte_unused, - struct rte_mbuf **pkts, - uint16_t nb_pkts, - void *_ __rte_unused) +calc_latency(uint16_t pid, uint16_t qid, + struct rte_mbuf **pkts, uint16_t nb_pkts, + void *user_cb __rte_unused) { - unsigned int i, cnt = 0; - uint64_t now; - float latency[nb_pkts]; - static float prev_latency; - /* - * Alpha represents degree of weighting decrease in EWMA, - * a constant smoothing factor between 0 and 1. The value - * is used below for measuring average latency. - */ - const float alpha = 0.2; + struct rte_latency_stats *stats = &latency_stats->stats[pid][qid]; + unsigned int i; + uint64_t now, *prev_latency; + prev_latency = &tx_cbs[pid][qid].prev_latency; now = rte_rdtsc(); for (i = 0; i < nb_pkts; i++) { - if (pkts[i]->ol_flags & timestamp_dynflag) - latency[cnt++] = now - *timestamp_dynfield(pkts[i]); - } + uint64_t latency; + int64_t delta; + + if ((pkts[i]->ol_flags & timestamp_dynflag) == 0) + continue; + + latency = now - *timestamp_dynfield(pkts[i]); - rte_spinlock_lock(&glob_stats->lock); - for (i = 0; i < cnt; i++) { /* * The jitter is calculated as statistical mean of interpacket * delay variation. The "jitter estimate" is computed by taking @@ -187,24 +215,25 @@ calc_latency(uint16_t pid __rte_unused, * Reference: Calculated as per RFC 5481, sec 4.1, * RFC 3393 sec 4.5, RFC 1889 sec. */ - glob_stats->jitter += (fabsf(prev_latency - latency[i]) - - glob_stats->jitter)/16; - if (glob_stats->min_latency == 0) - glob_stats->min_latency = latency[i]; - else if (latency[i] < glob_stats->min_latency) - glob_stats->min_latency = latency[i]; - else if (latency[i] > glob_stats->max_latency) - glob_stats->max_latency = latency[i]; + delta = *prev_latency - latency; + *prev_latency = latency; + latencystat_write(&stats->jitter, + stats->jitter + (delta - stats->jitter) / 16); + + if (stats->min_latency == 0 || latency < stats->min_latency) + latencystat_write(&stats->min_latency, latency); + else if (latency > stats->max_latency) + latencystat_write(&stats->max_latency, latency); + /* * The average latency is measured using exponential moving * average, i.e. using EWMA * https://en.wikipedia.org/wiki/Moving_average */ - glob_stats->avg_latency += - alpha * (latency[i] - glob_stats->avg_latency); - prev_latency = latency[i]; + delta = latency - stats->avg_latency; + latency = (delta + 3 * stats->avg_latency) / 4; + latencystat_write(&stats->avg_latency, latency); } - rte_spinlock_unlock(&glob_stats->lock); return nb_pkts; } @@ -214,38 +243,34 @@ rte_latencystats_init(uint64_t app_samp_intvl, rte_latency_stats_flow_type_fn user_cb) { unsigned int i; - uint16_t pid; - uint16_t qid; - struct rxtx_cbs *cbs = NULL; - const char *ptr_strings[NUM_LATENCY_STATS] = {0}; - const struct rte_memzone *mz = NULL; - const unsigned int flags = 0; + uint16_t pid, qid; + const char *ptr_strings[NUM_LATENCY_STATS]; + const struct rte_memzone *mz; int ret; if (rte_memzone_lookup(MZ_RTE_LATENCY_STATS)) return -EEXIST; - /** Allocate stats in shared memory fo multi process support */ - mz = rte_memzone_reserve(MZ_RTE_LATENCY_STATS, sizeof(*glob_stats), - rte_socket_id(), flags); + /** Allocate stats in shared memory for multi process support */ + mz = rte_memzone_reserve(MZ_RTE_LATENCY_STATS, sizeof(*latency_stats), + rte_socket_id(), 0); if (mz == NULL) { LATENCY_STATS_LOG(ERR, "Cannot reserve memory: %s:%d", __func__, __LINE__); return -ENOMEM; } - glob_stats = mz->addr; - rte_spinlock_init(&glob_stats->lock); - samp_intvl = app_samp_intvl * latencystat_cycles_per_ns(); + latency_stats = mz->addr; + samp_intvl = (app_samp_intvl * NSEC_PER_SEC) / rte_get_tsc_hz(); - /** Register latency stats with stats library */ + /* Register latency stats with stats library */ for (i = 0; i < NUM_LATENCY_STATS; i++) ptr_strings[i] = lat_stats_strings[i].name; latency_stats_index = rte_metrics_reg_names(ptr_strings, NUM_LATENCY_STATS); if (latency_stats_index < 0) { - LATENCY_STATS_LOG(DEBUG, + LATENCY_STATS_LOG(ERR, "Failed to register latency stats names"); return -1; } @@ -262,10 +287,11 @@ rte_latencystats_init(uint64_t app_samp_intvl, /** Register Rx/Tx callbacks */ RTE_ETH_FOREACH_DEV(pid) { struct rte_eth_dev_info dev_info; + const struct rte_eth_rxtx_callback *cb; ret = rte_eth_dev_info_get(pid, &dev_info); if (ret != 0) { - LATENCY_STATS_LOG(INFO, + LATENCY_STATS_LOG(NOTICE, "Error during getting device (port %u) info: %s", pid, strerror(-ret)); @@ -273,23 +299,25 @@ rte_latencystats_init(uint64_t app_samp_intvl, } for (qid = 0; qid < dev_info.nb_rx_queues; qid++) { - cbs = &rx_cbs[pid][qid]; - cbs->cb = rte_eth_add_first_rx_callback(pid, qid, - add_time_stamps, user_cb); - if (!cbs->cb) - LATENCY_STATS_LOG(INFO, "Failed to " - "register Rx callback for pid=%d, " - "qid=%d", pid, qid); + cb = rte_eth_add_first_rx_callback(pid, qid, add_time_stamps, user_cb); + if (cb) + rx_cbs[pid][qid].cb = cb; + else + LATENCY_STATS_LOG(NOTICE, + "Failed to register Rx callback for pid=%d, qid=%d", + pid, qid); } + for (qid = 0; qid < dev_info.nb_tx_queues; qid++) { - cbs = &tx_cbs[pid][qid]; - cbs->cb = rte_eth_add_tx_callback(pid, qid, - calc_latency, user_cb); - if (!cbs->cb) - LATENCY_STATS_LOG(INFO, "Failed to " - "register Tx callback for pid=%d, " - "qid=%d", pid, qid); + cb = rte_eth_add_tx_callback(pid, qid, calc_latency, user_cb); + if (cb) + tx_cbs[pid][qid].cb = cb; + else + LATENCY_STATS_LOG(NOTICE, + "Failed to register Tx callback for pid=%d, qid=%d", + pid, qid); } + } return 0; } @@ -297,19 +325,18 @@ rte_latencystats_init(uint64_t app_samp_intvl, int rte_latencystats_uninit(void) { - uint16_t pid; - uint16_t qid; - int ret = 0; - struct rxtx_cbs *cbs = NULL; - const struct rte_memzone *mz = NULL; + const struct rte_memzone *mz; + uint16_t pid, qid; + int ret; /** De register Rx/Tx callbacks */ RTE_ETH_FOREACH_DEV(pid) { struct rte_eth_dev_info dev_info; + const struct rte_eth_rxtx_callback *cb; ret = rte_eth_dev_info_get(pid, &dev_info); if (ret != 0) { - LATENCY_STATS_LOG(INFO, + LATENCY_STATS_LOG(NOTICE, "Error during getting device (port %u) info: %s", pid, strerror(-ret)); @@ -317,20 +344,23 @@ rte_latencystats_uninit(void) } for (qid = 0; qid < dev_info.nb_rx_queues; qid++) { - cbs = &rx_cbs[pid][qid]; - ret = rte_eth_remove_rx_callback(pid, qid, cbs->cb); + cb = rx_cbs[pid][qid].cb; + if (cb == NULL) + continue; + + ret = rte_eth_remove_rx_callback(pid, qid, cb); if (ret) - LATENCY_STATS_LOG(INFO, "failed to " - "remove Rx callback for pid=%d, " - "qid=%d", pid, qid); + LATENCY_STATS_LOG(NOTICE, "Failed to remove Rx callback"); } + for (qid = 0; qid < dev_info.nb_tx_queues; qid++) { - cbs = &tx_cbs[pid][qid]; - ret = rte_eth_remove_tx_callback(pid, qid, cbs->cb); + cb = tx_cbs[pid][qid].cb; + if (cb == NULL) + continue; + + ret = rte_eth_remove_tx_callback(pid, qid, cb); if (ret) - LATENCY_STATS_LOG(INFO, "failed to " - "remove Tx callback for pid=%d, " - "qid=%d", pid, qid); + LATENCY_STATS_LOG(NOTICE, "Failed to remove Tx callback"); } } @@ -360,6 +390,9 @@ rte_latencystats_get_names(struct rte_metric_name *names, uint16_t size) int rte_latencystats_get(struct rte_metric_value *values, uint16_t size) { + unsigned int i; + uint64_t stats[NUM_LATENCY_STATS]; + if (size < NUM_LATENCY_STATS || values == NULL) return NUM_LATENCY_STATS; @@ -371,11 +404,18 @@ rte_latencystats_get(struct rte_metric_value *values, uint16_t size) "Latency stats memzone not found"); return -ENOMEM; } - glob_stats = mz->addr; + + latency_stats = mz->addr; } /* Retrieve latency stats */ - rte_latencystats_fill_values(values); + latencystats_collect(stats); + + for (i = 0; i < NUM_LATENCY_STATS; i++) { + values[i].key = i; + values[i].value = stats[i]; + } + return NUM_LATENCY_STATS; } -- 2.43.0