From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 1A0C7A0032; Mon, 11 Jul 2022 11:47:21 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 67B7E40DF6; Mon, 11 Jul 2022 11:47:20 +0200 (CEST) Received: from mail-wr1-f44.google.com (mail-wr1-f44.google.com [209.85.221.44]) by mails.dpdk.org (Postfix) with ESMTP id 61CDC40684 for ; Mon, 11 Jul 2022 11:47:19 +0200 (CEST) Received: by mail-wr1-f44.google.com with SMTP id h17so6297955wrx.0 for ; Mon, 11 Jul 2022 02:47:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind.com; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=4r9mDbEcp9hhcNpxUAYdx3GsID5kkYAhjFI1NZxeUiA=; b=RaNjjCHNCgAvUQsaNn2C1DJnNTmwJqHAr5qFSmqeGTVnblnsUO3KX2R1ORNXkDqZvP cPNUI2wiY166XlKfeVOzyaek/zpEkF4/ql+FK1ckmh0dXbMkjzjKBMtH196sjxt0Ddlf Yzhvlsqnd26p9R/Ef80gg58d5w95wcy4lbUaqQbvxcXNwKJHRsPhj2MWMwnQ3Eg87y8/ X7PvSCeEit37DtT4/HUR6qNC5QK/KKmnOLNsY78rH9Kz9jT9ozWZuN8VFaM0op6MEx9I 3Ov2kCH1oc07ms20dxTr1fcK9QfokudugUCjbaBzMsdHDz+zfMJFzRWwVgN2hj08SAhA O3MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=4r9mDbEcp9hhcNpxUAYdx3GsID5kkYAhjFI1NZxeUiA=; b=X9CskAYhmnswuOCIfMbIu72WGJXPMNqk27lTezZbIO/2uVLDIjUm4oT1HIWicobkyd fG0Qirdg6ZDhNjSHH//L6KvPBmkXo8pfJNDQlEDLFNlsSBWyE2VQBcN8/Ri2ixOK+Xbp 8hy7Ug7attE3S9fTtna84kfGtMu+Yfk40eW/ckVv8mxhYzOW7EU9gg2HZH7zBKQ2qSS0 f97BKmArZA4QgfuQMwXqFoF7AG8ahh8KZ7xVvLGzUgi76FAE0DvbZXWsGGo9GHXaJIaP 7PiT5rXZITuGzIiRAwlb5JroMVGC6JuxsJeOEb0FugEbFBy+YBOD3yMsZyFf7cn25OPI uecQ== X-Gm-Message-State: AJIora/VQ1Ht3/XKfLQHoRrJ5R65f3aTkcN3b4PnPc6Ry0xznVuL2jzr Bzyqk/14b5FGj1UAcSxgFad9ww== X-Google-Smtp-Source: AGRyM1swaE5inpTyaRe11IZtdlmuQlC/b0HGqFubmYzegOTUhAiEv5wPeWniSG/SoCpIgQfVUIUgfw== X-Received: by 2002:adf:ce0c:0:b0:21d:929e:1522 with SMTP id p12-20020adfce0c000000b0021d929e1522mr13296697wrn.126.1657532839016; Mon, 11 Jul 2022 02:47:19 -0700 (PDT) Received: from 6wind.com ([2a01:e0a:5ac:6460:c065:401d:87eb:9b25]) by smtp.gmail.com with ESMTPSA id f13-20020a05600c154d00b003a04d19dab3sm24521475wmg.3.2022.07.11.02.47.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Jul 2022 02:47:18 -0700 (PDT) Date: Mon, 11 Jul 2022 11:47:17 +0200 From: Olivier Matz To: Mattias =?iso-8859-1?Q?R=F6nnblom?= Cc: Emil Berg , bruce.richardson@intel.com, stephen@networkplumber.org, stable@dpdk.org, bugzilla@dpdk.org, dev@dpdk.org, onar.olsen@ericsson.com, Morten =?iso-8859-1?Q?Br=F8rup?= Subject: Re: [PATCH v2 1/2] app/test: add cksum performance test Message-ID: References: <6839721a-8050-0e11-0c66-0f735ec8c56d@ericsson.com> <20220708125608.24532-1-mattias.ronnblom@ericsson.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20220708125608.24532-1-mattias.ronnblom@ericsson.com> X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Hi Mattias, Please see few comments below. On Fri, Jul 08, 2022 at 02:56:07PM +0200, Mattias Rönnblom wrote: > Add performance test for the rte_raw_cksum() function, which delegates > the actual work to __rte_raw_cksum(), which in turn is used by other > functions in need of Internet checksum calculation. > > Signed-off-by: Mattias Rönnblom > > --- > > v2: > * Added __rte_unused to unused volatile variable, to keep the Intel > compiler happy. > --- > MAINTAINERS | 1 + > app/test/meson.build | 1 + > app/test/test_cksum_perf.c | 118 +++++++++++++++++++++++++++++++++++++ > 3 files changed, 120 insertions(+) > create mode 100644 app/test/test_cksum_perf.c > > diff --git a/MAINTAINERS b/MAINTAINERS > index c923712946..2a4c99e05a 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1414,6 +1414,7 @@ Network headers > M: Olivier Matz > F: lib/net/ > F: app/test/test_cksum.c > +F: app/test/test_cksum_perf.c > > Packet CRC > M: Jasvinder Singh > diff --git a/app/test/meson.build b/app/test/meson.build > index 431c5bd318..191db03d1d 100644 > --- a/app/test/meson.build > +++ b/app/test/meson.build > @@ -18,6 +18,7 @@ test_sources = files( > 'test_bpf.c', > 'test_byteorder.c', > 'test_cksum.c', > + 'test_cksum_perf.c', > 'test_cmdline.c', > 'test_cmdline_cirbuf.c', > 'test_cmdline_etheraddr.c', > diff --git a/app/test/test_cksum_perf.c b/app/test/test_cksum_perf.c > new file mode 100644 > index 0000000000..bff73cb3bb > --- /dev/null > +++ b/app/test/test_cksum_perf.c > @@ -0,0 +1,118 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2022 Ericsson AB > + */ > + > +#include > + > +#include > +#include > +#include > +#include > +#include > + > +#include "test.h" > + > +#define NUM_BLOCKS (10) > +#define ITERATIONS (1000000) Parenthesis can be safely removed > + > +static const size_t data_sizes[] = { 20, 21, 100, 101, 1500, 1501 }; > + > +static __rte_noinline uint16_t > +do_rte_raw_cksum(const void *buf, size_t len) > +{ > + return rte_raw_cksum(buf, len); > +} I don't understand the need to have this wrapper, especially marked __rte_noinline. What is the objective? Note that when I remove the __rte_noinline, the performance is better for size 20 and 21. > + > +static void > +init_block(void *buf, size_t len) Can buf be a (char *) instead? It would avoid a cast below. > +{ > + size_t i; > + > + for (i = 0; i < len; i++) > + ((char *)buf)[i] = (uint8_t)rte_rand(); > +} > + > +static int > +test_cksum_perf_size_alignment(size_t block_size, bool aligned) > +{ > + char *data[NUM_BLOCKS]; > + char *blocks[NUM_BLOCKS]; > + unsigned int i; > + uint64_t start; > + uint64_t end; > + /* Floating point to handle low (pseudo-)TSC frequencies */ > + double block_latency; > + double byte_latency; > + volatile __rte_unused uint64_t sum = 0; > + > + for (i = 0; i < NUM_BLOCKS; i++) { > + data[i] = rte_malloc(NULL, block_size + 1, 0); > + > + if (data[i] == NULL) { > + printf("Failed to allocate memory for block\n"); > + return TEST_FAILED; > + } > + > + init_block(data[i], block_size + 1); > + > + blocks[i] = aligned ? data[i] : data[i] + 1; > + } > + > + start = rte_rdtsc(); > + > + for (i = 0; i < ITERATIONS; i++) { > + unsigned int j; > + for (j = 0; j < NUM_BLOCKS; j++) > + sum += do_rte_raw_cksum(blocks[j], block_size); > + } > + > + end = rte_rdtsc(); > + > + block_latency = (end - start) / (double)(ITERATIONS * NUM_BLOCKS); > + byte_latency = block_latency / block_size; > + > + printf("%-9s %10zd %19.1f %16.2f\n", aligned ? "Aligned" : "Unaligned", > + block_size, block_latency, byte_latency); When I run the test on my dev machine, I get the following results, which are quite reproductible: Aligned 20 10.4 0.52 (range is 0.48 - 0.52) Unaligned 20 7.9 0.39 (range is 0.39 - 0.40) ... If I increase the number of iterations, the first results change significantly: Aligned 20 8.2 0.42 (range is 0.41 - 0.42) Unaligned 20 8.0 0.40 (always this value) To have more precise tests with small size, would it make sense to target a test time instead of an iteration count? Something like this: #define ITERATIONS 1000000 uint64_t iterations = 0; ... do { for (i = 0; i < ITERATIONS; i++) { unsigned int j; for (j = 0; j < NUM_BLOCKS; j++) sum += do_rte_raw_cksum(blocks[j], block_size); } iterations += ITERATIONS; end = rte_rdtsc(); } while ((end - start) < rte_get_tsc_hz()); block_latency = (end - start) / (double)(iterations * NUM_BLOCKS); After this change, the aligned and unaligned cases have the same performance on my machine. > + > + for (i = 0; i < NUM_BLOCKS; i++) > + rte_free(data[i]); > + > + return TEST_SUCCESS; > +} > + > +static int > +test_cksum_perf_size(size_t block_size) > +{ > + int rc; > + > + rc = test_cksum_perf_size_alignment(block_size, true); > + if (rc != TEST_SUCCESS) > + return rc; > + > + rc = test_cksum_perf_size_alignment(block_size, false); > + > + return rc; > +} > + > +static int > +test_cksum_perf(void) > +{ > + uint16_t i; > + > + printf("### rte_raw_cksum() performance ###\n"); > + printf("Alignment Block size TSC cycles/block TSC cycles/byte\n"); > + > + for (i = 0; i < RTE_DIM(data_sizes); i++) { > + int rc; > + > + rc = test_cksum_perf_size(data_sizes[i]); > + if (rc != TEST_SUCCESS) > + return rc; > + } > + > + return TEST_SUCCESS; > +} > + > + > +REGISTER_TEST_COMMAND(cksum_perf_autotest, test_cksum_perf); > + The last empty line can be removed. > -- > 2.25.1 >