From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 66BDFA04BC; Wed, 2 Sep 2020 16:23:47 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 480C01C0AE; Wed, 2 Sep 2020 16:23:47 +0200 (CEST) Received: from thoth.sbs.de (thoth.sbs.de [192.35.17.2]) by dpdk.org (Postfix) with ESMTP id 092EC137D for ; Wed, 2 Sep 2020 09:56:15 +0200 (CEST) Received: from mail2.sbs.de (mail2.sbs.de [192.129.41.66]) by thoth.sbs.de (8.15.2/8.15.2) with ESMTPS id 0827u9vx000562 (version=TLSv1.2 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 2 Sep 2020 09:56:13 +0200 Received: from md1za8fc.ad001.siemens.net ([167.87.29.156]) by mail2.sbs.de (8.15.2/8.15.2) with ESMTP id 0827u7lg010750; Wed, 2 Sep 2020 09:56:07 +0200 Date: Wed, 2 Sep 2020 09:56:05 +0200 From: Henning Schild To: Thomas Monjalon Cc: Felix Moessbauer , dev@dpdk.org, ktraynor@redhat.com, ali.rizvi@emumba.com, Muhammad Ahmad , david.hunt@intel.com, reshma.pattan@intel.com, konstantin.ananyev@intel.com, bruce.richardson@intel.com, ferruh.yigit@intel.com, jerinj@marvell.com, honnappa.nagarahalli@arm.com, olivier.matz@6wind.com, lijuan.tu@intel.com, Owen Hilyard , Morten =?UTF-8?B?QnLDuHJ1cA==?= , viacheslavo@nvidia.com, Stephen Hemminger , anatoly.burakov@intel.com Message-ID: <20200902095605.4c8e9373@md1za8fc.ad001.siemens.net> In-Reply-To: <7023104.8KzyT1xApx@thomas> References: <20200709092711.14900-3-felix.moessbauer@siemens.com> <20200715155409.9835-3-felix.moessbauer@siemens.com> <20200828162203.320d06da@md1za8fc.ad001.siemens.net> <7023104.8KzyT1xApx@thomas> X-Mailer: Claws Mail 3.17.6 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Mailman-Approved-At: Wed, 02 Sep 2020 16:23:45 +0200 Subject: Re: [dpdk-dev] [PATCH v4 2/2] Add l2reflect measurement application X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Am Tue, 01 Sep 2020 17:07:38 +0200 schrieb Thomas Monjalon : > 28/08/2020 16:22, Henning Schild: > > Thomas, > > > > what is the state on this one? Any more steps to take or people to > > involve? > > I can try adding some key people in Cc list, > while doing a quick review. > > > > I first showed that in action back in 2016 on FOSDEM. > > https://archive.fosdem.org/2016/schedule/event/virt_iaas_real_time_cloud/ > > After my talk two devs from dpdk approached me because they liked > > the idea of a "network cyclictest". It took us some time to finally > > go upstream with it, unfortunately i do not recall names. > > I think I was one of them. > > There will be some talks about latency in the next virtual DPDK event: > https://events.linuxfoundation.org/dpdk-userspace-summit/program/schedule/ > (speakers are Cc'ed) Thanks, that sure looks like we are not the only ones who are interested in latency and probably a way to measure it. From other doing it as well it would be nice to hear how they currently do that. > > This application can potentially be integrated into the test-suite > > for QA, making sure no changes heavily increase the package > > transmission worst case timing. > > Good > > > > Felix Moessbauer wrote: > > > > > The l2reflect application implements a ping-pong benchmark to > > > measure the latency between two instances. For communication, > > > we use raw ethernet and send one packet at a time. The timing data > > > is collected locally and min/max/avg values are displayed in a > > > TUI. Finally, a histogram of the latencies is printed which can be > > > further processed with the jitterdebugger visualization scripts. > > Please can you show an example of script usage? > > > > To debug latency spikes, a max threshold can be defined. > > > If it is hit, a trace point is created on both instances. > > > > > > Signed-off-by: Felix Moessbauer > > > Signed-off-by: Henning Schild > [...] > > > --- a/app/Makefile > > > +++ b/app/Makefile > > No need to update Makefile, it will be removed. > > > > --- /dev/null > > > +++ b/app/l2reflect/l2reflect.h > > > @@ -0,0 +1,62 @@ > > > +/* > > > + * SPDX-License-Identifier: BSD-3-Clause > > SPDX should be the first line. > > > > + * Copyright(c) 2020 Siemens AG > > > + * > > > + * authors: > > > + * Felix Moessbauer > > > + * Henning Schild > > It is uncommon to mention authors in the file. > In general, git metadata covers this info. > Any special requirement? That application has some history. Written by me and eventually improved for upstream by Felix. The idea here was to give credit to two people with just one author field in git. If that is really in the way, we can drop it and make Felix the sole author. > [...] > > > + > > > +/* > > > + * Compares a packet destination MAC address to a device MAC > > > address. > > > + */ > > > +static __rte_always_inline int > > > +ether_addr_cmp(struct rte_ether_addr *ea, struct rte_ether_addr > > > *eb) +{ > > > + return ((*(uint64_t *)ea ^ *(uint64_t *)eb) & > > > MAC_ADDR_CMP) == 0; +} > > Please use rte_is_same_ether_addr() > > > > --- /dev/null > > > +++ b/app/l2reflect/main.c > > > @@ -0,0 +1,872 @@ > > > +/* > > > + * SPDX-License-Identifier: BSD-3-Clause > > > + * Copyright(c) 2020 Siemens AG > > > + * > > > + * authors: > > > + * Felix Moessbauer > > > + * Henning Schild > > > + * > > > + * launch (non-rt kernel): l2reflect --lcores 0@0,1@6 -n 1 > > > + * launch (rt kernel): l2reflect --lcores 0@0,1@6 -n 1 -- -P 50 > > > -r -l > > Would be better in a --help section or in the doc. > > > > + * > > > + * The l2reflect application implements a ping-pong benchmark to > > > + * measure the latency between two instances. For communication, > > > + * we use raw ethernet and send one packet at a time. The timing > > > data > > > + * is collected locally and min/max/avg values are displayed in a > > > TUI. > > > + * Finally, a histogram of the latencies is printed which can be > > > + * further processed with the jitterdebugger visualization > > > scripts. > > > + * To debug latency spikes, a max threshold can be defined. > > > + * If it is hit, a trace point is created on both instances. > > > + */ > [...] > > > +__rte_format_printf(1, 0) > > > +static void > > > +trace_write(const char *fmt, ...) > > > +{ > > > + va_list ap; > > > + char buf[256]; > > > + int n, err; > > > + > > > + if (trace_fd == 0) > > > + trace_fd = > > > open("/sys/kernel/debug/tracing/trace_marker", > > > + O_WRONLY); > > Why not using rte_trace framework? I am not sure where rte_trace ends up eventually. But the idea here is to have kernel traces when looking at latency peaks. You might just be looking at an incorrect host tuning when you see a spike i.e. higher prio threads on your possibly isolated core. It is not tracing dpdk, but more how it fits in the overall picture. > > > + if (trace_fd < 0) > > > + return; > > > + > > > + va_start(ap, fmt); > > > + n = vsnprintf(buf, 256, fmt, ap); > > > + va_end(ap); > > > + > > > + err = write(trace_fd, buf, n); > > > + assert(err >= 1); > > > +} > > [...] > > > +static void > > > +l2reflect_simple_forward(struct rte_mbuf *m) > > > +{ > > > + struct rte_ether_hdr *eth; > > > + struct my_magic_packet *pkt; > > > + > > > + eth = rte_pktmbuf_mtod(m, struct rte_ether_hdr *); > > > + pkt = (struct my_magic_packet *)eth; > > > + > > > + /* dst addr */ > > > + rte_ether_addr_copy(ð->s_addr, ð->d_addr); > > > + /* src addr */ > > > + rte_ether_addr_copy(&l2reflect_port_eth_addr, > > > ð->s_addr); + > > > + if ((eth->ether_type == > > > rte_cpu_to_be_16(ETHER_TYPE_L2REFLECT)) && > > > + (pkt->magic == MAGIC_TRACE_PAYLOAD)) { > > > + /* and the native one */ > > > + trace_write("sending traced packet\n"); > > > + } > > > + > > > + l2reflect_send_packet(&m, l2reflect_port_number); > > > +} > > If I understand well, this requires a special cabling? > Would it be possible to measure latency of hardware NIC internal > loopback or software PMD loopback? Not sure. We always want to see the full picture ... i.e would also want to see if a NIC is slow to put a packet on the "wire". For testing we do use multi-port NICs with cable-loops. The other setup is Qemus with the application inside and a dpdk-based vhost switch. But that virtual setup is also not trivial. In fact we usually do make that switch send the packets over the cable-loop to again cover the whole way the packets go "in reality". > [...] > > > + printf("from MAC address: %02X:%02X:%02X:%02X:%02X:%02X > > > to" > > > + " %02X:%02X:%02X:%02X:%02X:%02X\n\n", > > > + eth->s_addr.addr_bytes[0], > > > eth->s_addr.addr_bytes[1], > > > + eth->s_addr.addr_bytes[2], > > > eth->s_addr.addr_bytes[3], > > > + eth->s_addr.addr_bytes[4], > > > eth->s_addr.addr_bytes[5], > > > + eth->d_addr.addr_bytes[0], > > > eth->d_addr.addr_bytes[1], > > > + eth->d_addr.addr_bytes[2], > > > eth->d_addr.addr_bytes[3], > > > + eth->d_addr.addr_bytes[4], > > > eth->d_addr.addr_bytes[5]); > > You could also use rte_ether_format_addr() > > [...] > > > +static inline unsigned int > > > +l2reflect_rx_filter( > > > + struct rte_mbuf **bufs, > > > + unsigned int nb_rx, > > > + unsigned int data_only) > > > +{ > > > + struct rte_ether_hdr *eth; > > > + struct my_magic_packet *pkt; > > > + unsigned int i, ret; > > > + > > > + ret = 0; > > > + for (i = 0; i < nb_rx; i++) { > > > + eth = rte_pktmbuf_mtod(bufs[i], struct > > > rte_ether_hdr *); > > > + if (l2reflect_state != S_ELECT_LEADER && > > > + !ether_addr_cmp(ð->s_addr, > > > &l2reflect_remote_eth_addr)) > > > + goto drop; > > > + > > > + if (eth->ether_type != > > > rte_cpu_to_be_16(ETHER_TYPE_L2REFLECT)) > > > + goto drop; > > > + > > > + pkt = (struct my_magic_packet *)eth; > > > + if (data_only && (pkt->type != TRACE_TYPE_DATA && > > > + pkt->type != TRACE_TYPE_RSET && > > > + pkt->type != TRACE_TYPE_QUIT)) > > > + goto drop; > > > + > > > + bufs[ret++] = bufs[i]; > > > + continue; > > > +drop: > > > + rte_pktmbuf_free(bufs[i]); > > > + } > > > + > > > + return ret; > > > +} > > This function deserves some comments to explain the logic. > > [...] > > > --- /dev/null > > > +++ b/app/l2reflect/meson.build > > > @@ -0,0 +1,25 @@ > > > +# SPDX-License-Identifier: BSD-3-Clause > > > +# Copyright(c) 2020 Siemens AG > > > + > > > +# meson file, for building this example as part of a main DPDK > > > build. +cc = meson.get_compiler('c') > > Not sure you need that line. > > > > + > > > +cjson = dependency('libcjson', required: false) > > Some other parts require jansson: > jansson = dependency('jansson', required: false) > How libcjson is different? Would it be possible to unify dependency? > > > > +if not is_linux > > > + build = false > > Why limiting to Linux? Probably because of the RT stuff (sched param setting, mlock) and the tracing. I would propose it like that and hope that others might ifdef when enabling other OSs. Maintaining low latency requires quite a bit of host tuning. This can get especially tricky when consuming the whole CPU with polling, you can starve your OS. The application alone does not do that for you, not even on Linux. Like cyclictest or jitterdebugger i would see it as a tool that requires knowledge to measure correctly. And it should probably be added as such, with an integration into automated tests and support for other OSs possibly coming in the future. I hope that others doing dpdk for low latency on preempt-rt will find it useful as is. They might have been looking for something like it, possibly have something like it in their test-labs and did not publish it yet. Henning > [...] > > > +#define ANSI_BOLD_RED "\x1b[01;31m" > > > +#define ANSI_BOLD_GREEN "\x1b[01;32m" > > > +#define ANSI_BOLD_YELLOW "\x1b[01;33m" > > > +#define ANSI_BOLD_BLUE "\x1b[01;34m" > > > +#define ANSI_BOLD_MAGENTA "\x1b[01;35m" > > > +#define ANSI_BOLD_CYAN "\x1b[01;36m" > > > +#define ANSI_RESET "\x1b[0m" > > Not sure about using colors. > If it really adds a value, I think it should be an option, > automatically disabled if stdout is redirected. > > [...] > > > --- a/app/meson.build > > > +++ b/app/meson.build > > > @@ -8,6 +8,7 @@ endif > > > apps = [ > > > 'pdump', > > > 'proc-info', > > > + 'l2reflect', > > > 'test-acl', > > > 'test-bbdev', > > > 'test-cmdline', > > I think you should keep alphabetical ordering. > > I'm not sure about adding this application in DPDK. > I think we need some latency tools, > but if it requires a specific setup, it could better fit in DTS. > We need more opinions here. > >