From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f65.google.com (mail-wm0-f65.google.com [74.125.82.65]) by dpdk.org (Postfix) with ESMTP id 9FCBD1B669 for ; Fri, 10 Nov 2017 10:12:32 +0100 (CET) Received: by mail-wm0-f65.google.com with SMTP id b9so1254972wmh.5 for ; Fri, 10 Nov 2017 01:12:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6wind-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=uxG4mp7ynUX34UW5Bhxr/XNz8Q339ZZatYvEtwLlnOo=; b=WGLjj6GI2kuwp1gGVSOJEcHlRL23ac51fkVTOgFEjZSTKFuVDMabFtxdwgyMsCQb+s bX3n4VaF0RLM1NbY9NVw4BH2x89CjDRqdTAuQAUJyfCI06bBSvRqPgZLVHlM6nXP9TQ8 Ak6JES1mvrQLu/596o0kJ/YkhSmK8gKgc4NvRin2Y3yVOV+K8q95xbw7Rs1GOndpi8el KgMXDuANCe57RbdLaq1jTegVG1Ph8fAP6Ed/H+S93VT7Fi9nJ4+rMplku6C5AFiINEE1 +S/udizuZVkxOqASkyJxHbCOxCtc4HNEIj0e7Sjc2aC/HwcAAZnkrPCegp/Wp8OnE4Uu HVJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=uxG4mp7ynUX34UW5Bhxr/XNz8Q339ZZatYvEtwLlnOo=; b=FpiXjhmLb6mRq5IFQAwQnaePc68rIqivFyiYNjMxFUl8NNT7AK5fFOhanHH8XR55Um RW9ywLasFt9JJSWirO1mJoylieR4H/dM+Ft6EAOvbKHFF1G9Kp+8li0alorGPFzJ/0hh 33iPsx64RpwqyQrssxIN95s0D5qS2w21s2ZVoC1CQG/hUhmX5pemLZzr+LNBWnyyfpYl 1/Jc3DbF/bJs6YfoAwgtZ4vZ74qQ1lDU02FQhvre6eR/H5p3l+goD/1nM+lttehgq6+r //QlJZdT95nBnAwDeVO4e0jZKkyHKxLqzjL222GLiKPJpVMQ7MnyyuYUjhdVaYpuCEfk S7kA== X-Gm-Message-State: AJaThX4pRzeOkdI1sMhaRTXgm+5LHR3uKPZF9RIyrJOLB4PRQ6rYpVvn 9Z/zc/HaCDztmsq4B9Lm9RhggCth X-Google-Smtp-Source: AGs4zMa5EXCCKEc0dJY4z1MoF1duMAYNpMTe9mZUj2SLhJcQ2NtzkY6MBDnboxZfMNhefIm2P/sbPQ== X-Received: by 10.80.245.7 with SMTP id t7mr885707edm.71.1510305152197; Fri, 10 Nov 2017 01:12:32 -0800 (PST) Received: from 6wind.com (host.78.145.23.62.rev.coltfrance.com. [62.23.145.78]) by smtp.gmail.com with ESMTPSA id s6sm7386672edc.2.2017.11.10.01.12.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Nov 2017 01:12:31 -0800 (PST) Date: Fri, 10 Nov 2017 10:12:19 +0100 From: Adrien Mazarguil To: Marcelo Tosatti Cc: dev@dpdk.org, Luiz Capitulino , Daniel Bristot de Oliveira Message-ID: <20171110091219.GE24849@6wind.com> References: <20171110060210.GA23340@amt.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20171110060210.GA23340@amt.cnet> Subject: Re: [dpdk-dev] [PATCH] testpmd: add nanosleep in main loop X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Nov 2017 09:12:32 -0000 Hi Marcelo, On Fri, Nov 10, 2017 at 04:02:10AM -0200, Marcelo Tosatti wrote: > > This patch allows a configurable pair of values to be set, which > controls > the frequency and length of a nanosleep call performed at test-pmd's > iofwd main loop. > > The problem is the following: it is necessary to execute code > on isolated CPUs which is not part of the packet forwarding load. > > For example: > > "echo val > /sys/kernel/debug/tracing/buffer_size_kb" > > hangs the process, because the DPDK thread has higher > priority than the workqueue thread which executes the flush from > CPU local tracebuffer to CPU global trace buffer [the workitem > in case]. > > There are more serious issues than the trace-cmd bug, such as XFS > workitems failing to execute causing filesystem corruption. > > To workaround this problem, until a proper kernel > solution is developed, allow DPDK to nanosleep > (hopefully with a small enough frequency and interval > so that the performance is within acceptable levels). I understand the need to do something about it, however the nanosleep() approach seems questionable to me. Testpmd's forwarding modes (particularly I/O) are used for benchmarking purposes by many and are therefore sensitive to change. This code path is currently free from system calls for that reason and nanosleep() is an expensive one by definition. Even if optional or called at a low frequency, the presence of this new code has an impact. Since testpmd is a development tool not supposed to run in a production environment, is there really a need for it to be patched to work around a (temporary) Linux kernel bug? If so, why is I/O the only forwarding mode impacted? If it's used in a production environment and such a fix can't wait, have other workarounds been considered: - Replacing testpmd in I/O mode with a physical cable or switch? - Using proper options on the kernel command line as described in [1], such as isolcpus, rcu_nocbs, nohz_full? [1] doc/guides/howto/pvp_reference_benchmark.rst > > The new parameters are: > > * --delay-hz: sets nanosleep frequency in Hz. > * --delay-length: sets nanosleep length in ns. > > Results for delay-hz=100,delay-length=10000 (which allows > the buffer_size_kb change to complete): > > Baseline run-1: > [Histogram port 0 to port 1 at rate 2.3 Mpps] Samples: 49505, Average: > 19008.7 ns, StdDev: 2501.0 ns, Quartiles: 17293.0/18330.0/19901.0 ns > > Baseline run-2: > [Histogram port 0 to port 1 at rate 2.3 Mpps] Samples: 49606, Average: > 19036.4 ns, StdDev: 2485.2 ns, Quartiles: 17318.0/18349.0/19936.0 ns > > Baseline run-3: > [Histogram port 0 to port 1 at rate 2.3 Mpps] Samples: 49627, Average: > 19019.2 ns, StdDev: 2503.7 ns, Quartiles: 17323.0/18355.0/19940.0 ns > > ============================ > > (10.000us, 100HZ) > > Run-1: > [Histogram port 0 to port 1 at rate 2.3 Mpps] Samples: 7284, Average: > 20830.6 ns, StdDev: 12023.0 ns, Quartiles: 17309.0/18394.0/20233.0 ns > > Run-2: > [Histogram port 0 to port 1 at rate 2.3 Mpps] Samples: 6272, Average: > 20897.1 ns, StdDev: 12057.2 ns, Quartiles: 17389.0/18457.0/20266.0 ns > > Run-3: > [Histogram port 0 to port 1 at rate 2.3 Mpps] Samples: 4843, Average: > 20535.2 ns, StdDev: 9827.3 ns, Quartiles: 17389.0/18441.0/20269.0 ns > > > Signed-off-by: Marcelo Tosatti > > > diff -Nur dpdk-17.08.orig/app/test-pmd/iofwd.c dpdk-17.08/app/test-pmd/iofwd.c > --- dpdk-17.08.orig/app/test-pmd/iofwd.c 2017-10-30 22:45:37.829492673 -0200 > +++ dpdk-17.08/app/test-pmd/iofwd.c 2017-10-30 22:45:48.321522581 -0200 > @@ -64,9 +64,30 @@ > #include > #include > #include > +#include > > #include "testpmd.h" > > +uint32_t nanosleep_interval; > + > +static void calc_nanosleep_interval(int hz) > +{ > + uint64_t cycles_per_sec = rte_get_timer_hz(); > + nanosleep_interval = cycles_per_sec/hz; > +} > + > +static void do_nanosleep(void) > +{ > + struct timespec req; > + > + req.tv_sec = 0; > + req.tv_nsec = nanosleep_length; > + > + nanosleep(&req, NULL); > + > + return; > +} > + > /* > * Forwarding of packets in I/O mode. > * Forward packets "as-is". > @@ -81,6 +102,10 @@ > uint16_t nb_tx; > uint32_t retry; > > + > + if (nanosleep_interval == 0 && nanosleep_frequency > 0) > + calc_nanosleep_interval(nanosleep_frequency); > + > #ifdef RTE_TEST_PMD_RECORD_CORE_CYCLES > uint64_t start_tsc; > uint64_t end_tsc; > @@ -91,6 +116,12 @@ > start_tsc = rte_rdtsc(); > #endif > > + if (nanosleep_frequency > 0 && > + rte_get_timer_cycles() > fs->next_nanosleep) { > + do_nanosleep(); > + fs->next_nanosleep = rte_get_timer_cycles() + nanosleep_interval; > + } > + > /* > * Receive a burst of packets and forward them. > */ > diff -Nur dpdk-17.08.orig/app/test-pmd/parameters.c dpdk-17.08/app/test-pmd/parameters.c > --- dpdk-17.08.orig/app/test-pmd/parameters.c 2017-10-30 22:45:37.830492676 -0200 > +++ dpdk-17.08/app/test-pmd/parameters.c 2017-10-30 22:46:33.708651912 -0200 > @@ -216,6 +216,8 @@ > "disable print of designated event or all of them.\n"); > printf(" --flow-isolate-all: " > "requests flow API isolated mode on all ports at initialization time.\n"); > + printf(" --delay-hz: sets nanosleep frequency in Hz.\n"); > + printf(" --delay-length: sets nanosleep length in ns.\n"); > } > > #ifdef RTE_LIBRTE_CMDLINE > @@ -638,7 +640,9 @@ > { "no-rmv-interrupt", 0, 0, 0 }, > { "print-event", 1, 0, 0 }, > { "mask-event", 1, 0, 0 }, > - { 0, 0, 0, 0 }, > + { "delay-hz", 1, 0, 0 }, > + { "delay-length", 1, 0, 0 }, > + { 0, 0, 0, 0 }, > }; > > argvopt = argv; > @@ -1099,6 +1103,27 @@ > else > rte_exit(EXIT_FAILURE, "bad txpkts\n"); > } > + > + if (!strcmp(lgopts[opt_idx].name, "delay-hz")) { > + int n; > + > + n = atoi(optarg); > + > + if (n < 0) > + rte_exit(EXIT_FAILURE, "bad delay-hz\n"); > + nanosleep_frequency = n; > + } > + > + if (!strcmp(lgopts[opt_idx].name, "delay-length")) { > + int n; > + > + n = atoi(optarg); > + > + if (n < 0) > + rte_exit(EXIT_FAILURE, "bad delay-length\n"); > + nanosleep_length = n; > + } > + > if (!strcmp(lgopts[opt_idx].name, "no-flush-rx")) > no_flush_rx = 1; > if (!strcmp(lgopts[opt_idx].name, "disable-link-check")) > diff -Nur dpdk-17.08.orig/app/test-pmd/testpmd.c dpdk-17.08/app/test-pmd/testpmd.c > --- dpdk-17.08.orig/app/test-pmd/testpmd.c 2017-10-30 22:45:37.829492673 -0200 > +++ dpdk-17.08/app/test-pmd/testpmd.c 2017-10-30 22:45:48.323522591 -0200 > @@ -327,6 +327,13 @@ > > #endif > > + > +/* How long to sleep in packet processing */ > +uint32_t nanosleep_length; > + > +/* How often to sleep in packet processing */ > +uint32_t nanosleep_frequency; > + > /* > * Ethernet device configuration. > */ > diff -Nur dpdk-17.08.orig/app/test-pmd/testpmd.h dpdk-17.08/app/test-pmd/testpmd.h > --- dpdk-17.08.orig/app/test-pmd/testpmd.h 2017-10-30 22:45:37.829492673 -0200 > +++ dpdk-17.08/app/test-pmd/testpmd.h 2017-10-30 22:45:48.323522591 -0200 > @@ -127,6 +127,7 @@ > struct pkt_burst_stats rx_burst_stats; > struct pkt_burst_stats tx_burst_stats; > #endif > + uint64_t next_nanosleep; > }; > > /** Offload IP checksum in csum forward engine */ > @@ -390,6 +391,9 @@ > extern lcoreid_t latencystats_lcore_id; > #endif > > +extern uint32_t nanosleep_length; > +extern uint32_t nanosleep_frequency; > + > #ifdef RTE_LIBRTE_BITRATE > extern lcoreid_t bitrate_lcore_id; > extern uint8_t bitrate_enabled; > > > > > > > > > -- Adrien Mazarguil 6WIND