From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1B6FE41C4B; Thu, 9 Feb 2023 09:43:40 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id ED4A740DF8; Thu, 9 Feb 2023 09:43:39 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id C05764067B for ; Thu, 9 Feb 2023 09:43:38 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675932218; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2m/DX9Q8sWWfpJ5HGzOUm3XUe/vhSPY4UmYucv6CvWo=; b=MUEV4tp9VE46lCavxkRBRTI5BS8AbMHaHl/eLzt1fZZ1nyZ3vVe8h6+DRmLUQkd/bWhMyA FaXXyNFWdwN9Qck/HcQPssr/RwkB6muq+vkAbCYY0+uoiNnBlwbs+QtQ6yK3QcZF6+ddAK DdkBXha/HrlrFycbkD1RvEn+mln9hGg= Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-56-h9SnT3ZQMV-nF3pD8X7EWg-1; Thu, 09 Feb 2023 03:43:37 -0500 X-MC-Unique: h9SnT3ZQMV-nF3pD8X7EWg-1 Received: by mail-pl1-f200.google.com with SMTP id jn16-20020a170903051000b00198f5741d23so952898plb.18 for ; Thu, 09 Feb 2023 00:43:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2m/DX9Q8sWWfpJ5HGzOUm3XUe/vhSPY4UmYucv6CvWo=; b=lebdDKq9fC6BChJn7vUKPO8OkZY42LjsRkx06f8EEur9g+IVDAZsC59/7HfIdzY71w zeJZS+3e+Bdyeud0vUCdbDmA9fudYV46Lfk89ExSa52l8pka5PNIBERoeu6qXs49oTqK kP2r8fN50vTXG5NtI8PwWjixP+H9B1XtXMERG3v0r+y1j1XPU4VMoVpMY+w4emm7s0rT W9FPSm+wzR/Ojsw4yRbwg8FEAgZ53+29U2CqdLw2gEQR0aY0lbPr2BBblMtnjsVrmLrg n3bqWPD5taq3Tv6G4E977F1P+Ux6q3qnl1TjeKNYOmtH5BF+uTcG7K641BqKVUAvBx/7 XRGQ== X-Gm-Message-State: AO0yUKXaayub0E3O8FLK+FN44bC64locdkv1ESe5w3eGPb6iT5gR2fvH 5rv7FMKE9R3hM25KfzyiNmcRgeddOt7LoXrkaLdUllby7m6s0p0MDYS7836MB/MWO8fAHVIAooe 6JvR5gSHSOD9U5BBerXA= X-Received: by 2002:a17:902:7885:b0:199:527d:42ca with SMTP id q5-20020a170902788500b00199527d42camr992547pll.13.1675932215988; Thu, 09 Feb 2023 00:43:35 -0800 (PST) X-Google-Smtp-Source: AK7set/UqNJpAQfpZYuCSqksifZ1LyroyiPcYYLW2GzBY+7dySZ0Fe/Vimyr+ViWAlftFoYd2JePm9wqfFPZZ3kHKg4= X-Received: by 2002:a17:902:7885:b0:199:527d:42ca with SMTP id q5-20020a170902788500b00199527d42camr992532pll.13.1675932215642; Thu, 09 Feb 2023 00:43:35 -0800 (PST) MIME-Version: 1.0 References: <20221123102612.1688865-1-rjarry@redhat.com> <20230208084507.1328625-1-rjarry@redhat.com> <20230208084507.1328625-5-rjarry@redhat.com> In-Reply-To: <20230208084507.1328625-5-rjarry@redhat.com> From: David Marchand Date: Thu, 9 Feb 2023 09:43:24 +0100 Message-ID: Subject: Re: [RESEND PATCH v9 4/5] app/testpmd: report lcore usage To: Robin Jarry Cc: dev@dpdk.org, =?UTF-8?Q?Morten_Br=C3=B8rup?= , Konstantin Ananyev , Kevin Laatz , Aman Singh , Yuying Zhang X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Wed, Feb 8, 2023 at 9:49 AM Robin Jarry wrote: > > The --record-core-cycles option already accounts for busy cycles. One > turn of packet_fwd_t is considered "busy" if there was at least one > received or transmitted packet. > > Rename core_cycles to busy_cycles in struct fwd_stream to make it more > explicit. Add total_cycles to struct fwd_lcore. Add cycles accounting in > noisy_vnf where it was missing. > > When --record-core-cycles is specified, register a callback with > rte_lcore_register_usage_cb() and update total_cycles every turn of > lcore loop based on a starting tsc value. > > In the callback, resolve the proper struct fwd_lcore based on lcore_id > and return the lcore total_cycles and the sum of busy_cycles of all its > fwd_streams. > > This makes the cycles counters available in rte_lcore_dump() and the > lcore telemetry API: > > testpmd> dump_lcores > lcore 3, socket 0, role RTE, cpuset 3 > lcore 4, socket 0, role RTE, cpuset 4, busy cycles 1228584096/9239923140 > lcore 5, socket 0, role RTE, cpuset 5, busy cycles 1255661768/9218141538 > > --> /eal/lcore/info,4 > { > "/eal/lcore/info": { > "lcore_id": 4, > "socket": 0, > "role": "RTE", > "cpuset": [ > 4 > ], > "busy_cycles": 10623340318, > "total_cycles": 55331167354 > } > } > > Signed-off-by: Robin Jarry > Acked-by: Morten Br=C3=B8rup > Acked-by: Konstantin Ananyev > Reviewed-by: Kevin Laatz > --- > > Notes: > v8 -> v9: Fixed accounting of total cycles > > app/test-pmd/noisy_vnf.c | 8 +++++++- > app/test-pmd/testpmd.c | 42 ++++++++++++++++++++++++++++++++++++---- > app/test-pmd/testpmd.h | 25 +++++++++++++++--------- > 3 files changed, 61 insertions(+), 14 deletions(-) > > diff --git a/app/test-pmd/noisy_vnf.c b/app/test-pmd/noisy_vnf.c > index c65ec6f06a5c..ce5a3e5e6987 100644 > --- a/app/test-pmd/noisy_vnf.c > +++ b/app/test-pmd/noisy_vnf.c > @@ -144,6 +144,7 @@ pkt_burst_noisy_vnf(struct fwd_stream *fs) > struct noisy_config *ncf =3D noisy_cfg[fs->rx_port]; > struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; > struct rte_mbuf *tmp_pkts[MAX_PKT_BURST]; > + uint64_t start_tsc =3D 0; > uint16_t nb_deqd =3D 0; > uint16_t nb_rx =3D 0; > uint16_t nb_tx =3D 0; > @@ -153,6 +154,8 @@ pkt_burst_noisy_vnf(struct fwd_stream *fs) > bool needs_flush =3D false; > uint64_t now; > > + get_start_cycles(&start_tsc); > + > nb_rx =3D rte_eth_rx_burst(fs->rx_port, fs->rx_queue, > pkts_burst, nb_pkt_per_burst); > inc_rx_burst_stats(fs, nb_rx); > @@ -169,7 +172,7 @@ pkt_burst_noisy_vnf(struct fwd_stream *fs) > inc_tx_burst_stats(fs, nb_tx); > fs->tx_packets +=3D nb_tx; > fs->fwd_dropped +=3D drop_pkts(pkts_burst, nb_rx, nb_tx); > - return; > + goto end; > } > > fifo_free =3D rte_ring_free_count(ncf->f); > @@ -219,6 +222,9 @@ pkt_burst_noisy_vnf(struct fwd_stream *fs) > fs->fwd_dropped +=3D drop_pkts(tmp_pkts, nb_deqd, sent); > ncf->prev_time =3D rte_get_timer_cycles(); > } > +end: > + if (nb_rx > 0 || nb_tx > 0) > + get_end_cycles(fs, start_tsc); > } > > #define NOISY_STRSIZE 256 > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c > index e366f81a0f46..eeb96aefa80b 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -2053,7 +2053,7 @@ fwd_stats_display(void) > fs->rx_bad_outer_ip_csum; > > if (record_core_cycles) > - fwd_cycles +=3D fs->core_cycles; > + fwd_cycles +=3D fs->busy_cycles; > } > for (i =3D 0; i < cur_fwd_config.nb_fwd_ports; i++) { > pt_id =3D fwd_ports_ids[i]; > @@ -2145,7 +2145,7 @@ fwd_stats_display(void) > else > total_pkts =3D total_recv; > > - printf("\n CPU cycles/packet=3D%.2F (total cycle= s=3D" > + printf("\n CPU cycles/packet=3D%.2F (busy cycles= =3D" > "%"PRIu64" / total %s packets=3D%"PRIu64")= at %"PRIu64 > " MHz Clock\n", > (double) fwd_cycles / total_pkts, > @@ -2184,8 +2184,10 @@ fwd_stats_reset(void) > > memset(&fs->rx_burst_stats, 0, sizeof(fs->rx_burst_stats)= ); > memset(&fs->tx_burst_stats, 0, sizeof(fs->tx_burst_stats)= ); > - fs->core_cycles =3D 0; > + fs->busy_cycles =3D 0; > } > + for (i =3D 0; i < cur_fwd_config.nb_fwd_lcores; i++) > + fwd_lcores[i]->total_cycles =3D 0; This instrumentation accuracy may not be that important in testpmd (becauase testpmd is just a test/validation tool). However, resetting total_cycles is setting a bad example for people who may look at this code. It does not comply with the EAL api. The associated lcores may still be running the moment a user reset the fwd stats. > } > > static void > @@ -2248,6 +2250,7 @@ static void > run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd) > { > struct fwd_stream **fsm; > + uint64_t start_tsc; > streamid_t nb_fs; > streamid_t sm_id; > #ifdef RTE_LIB_BITRATESTATS > @@ -2262,6 +2265,7 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_f= wd_t pkt_fwd) > #endif > fsm =3D &fwd_streams[fc->stream_idx]; > nb_fs =3D fc->stream_nb; > + start_tsc =3D rte_rdtsc(); > do { > for (sm_id =3D 0; sm_id < nb_fs; sm_id++) > if (!fsm[sm_id]->disabled) > @@ -2284,10 +2288,36 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet= _fwd_t pkt_fwd) > latencystats_lcore_id =3D=3D rte_lcore_id= ()) > rte_latencystats_update(); > #endif > - > + if (record_core_cycles) > + fc->total_cycles =3D rte_rdtsc() - start_tsc; By using a single tsc reference at the start of this function, total_cycles will be reset every time forwarding is stopped / restarted. A more accurate way to account for consumed cycles for this lcore would be to increase by a differential value for each loop. Like: @@ -2248,6 +2248,7 @@ static void run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd) { struct fwd_stream **fsm; + uint64_t prev_tsc; streamid_t nb_fs; streamid_t sm_id; #ifdef RTE_LIB_BITRATESTATS @@ -2262,6 +2263,7 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd) #endif fsm =3D &fwd_streams[fc->stream_idx]; nb_fs =3D fc->stream_nb; + prev_tsc =3D rte_rdtsc(); do { for (sm_id =3D 0; sm_id < nb_fs; sm_id++) if (!fsm[sm_id]->disabled) @@ -2285,9 +2287,42 @@ run_pkt_fwd_on_lcore(struct fwd_lcore *fc, packet_fwd_t pkt_fwd) rte_latencystats_update(); #endif + if (record_core_cycles) { + uint64_t current_tsc =3D rte_rdtsc(); + + fc->total_cycles +=3D current_tsc - prev_tsc; + prev_tsc =3D current_tsc; + } } while (! fc->stopped); } I also have one interrogation around those updates. I wonder if we are missing some __atomic_store/load pairs (probably more an issue for non-x86 arches), since the updates and reading those cycles happen on different threads. This issue predates your patch (for fs->core_cycles accesses previously). I am not asking for a fix right away, this last point can wait post -rc1. --=20 David Marchand