From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 43DA9A0507; Sat, 2 Apr 2022 00:59:06 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E384340E03; Sat, 2 Apr 2022 00:59:05 +0200 (CEST) Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) by mails.dpdk.org (Postfix) with ESMTP id 558954067E for ; Sat, 2 Apr 2022 00:59:04 +0200 (CEST) Received: by mail-lf1-f43.google.com with SMTP id bq24so7469392lfb.5 for ; Fri, 01 Apr 2022 15:59:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:mime-version :content-transfer-encoding; bh=y/ScpozBK77es0fBgXt8LPmCpVMquM4VHz/WBvapT2U=; b=Ow9wPj056qqLH0an4McVJOijoC2uyUBSb8GIVZhIDls6OvGf1LuEgzLAj3Vx2b1X11 0b1A3k7iJja7+U/u5UfV8Jjcb7PrQKzFU271hu1CNVyWixabJvF+TBMNirR6MZclrKe3 3NlqnIZLw075/h7+q0XSKD5Kuya2uYfO6oh0AmoLp9TO3MVHCOXNOno7txmxI9nMOypy v41Bv60ymDZQIcEHTbCR7bWf0SG6yU1Xv+PXm6Uo9ETEVV4GzAVkHMNfR3MpuHA2XN6b XIRZf5+IQATmmnMUEej7xhAXMXI9Vyjw6/pNsI6vHmEFnWLaOEOoDT+A4pkPbLofbi/4 i+Xw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version :content-transfer-encoding; bh=y/ScpozBK77es0fBgXt8LPmCpVMquM4VHz/WBvapT2U=; b=GhK89rQImBWJHrhHgPvOdzM2Yss4+2cYaWeUND+49CkKkWnghx1jZVSeTdlOFTqSep izWXG6IwNuG2UXss6N9ESxMigEHZ7uJUbIHrtVk1RuR3GJVD8KQsT7KQ/jd2Xt46cp3j LSNdj08l4w+qngs9yH+YYmRZ0xWCEnwPp2skP9kX7kv4LYtMVIVwPlRNuO0EY32lRAud QjDd7KS0yTX8xILEste3L5MyneYXYEzanaH7EuCo9gVqftBSKBrAcN98p/MEtLbcIjdf kR7uHaTrHNK6yPRtzKSZ0dRdGKmPhsiu5LcTGgzIzo03S8fczvV508Iw/Ki+TIPram7k oiTA== X-Gm-Message-State: AOAM531YshTk/osYU/A/JBzAHoEv/8Q9hJdeKXg2rEKvXYO8NHFI7FiW DCfBnCgblm4rQgqMTIBJsSrucXAbeyY= X-Google-Smtp-Source: ABdhPJymPtwNCCanMcmHVpcQf4A5KA8EjImstc4pc0pEji7g/AL/loHIpAyX+R6zGTb1T/588y7f2A== X-Received: by 2002:a05:6512:2312:b0:44a:d8d4:f900 with SMTP id o18-20020a056512231200b0044ad8d4f900mr3706361lfu.517.1648853943249; Fri, 01 Apr 2022 15:59:03 -0700 (PDT) Received: from sovereign (broadband-37-110-65-23.ip.moscow.rt.ru. [37.110.65.23]) by smtp.gmail.com with ESMTPSA id h10-20020a19700a000000b004481fbdf17fsm364959lfc.28.2022.04.01.15.59.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 01 Apr 2022 15:59:02 -0700 (PDT) Date: Sat, 2 Apr 2022 01:59:01 +0300 From: Dmitry Kozlyuk To: dev@dpdk.org, Vipin Varghese , Ciara Power Cc: Sivaprasad Tummala , Tyler Retzlaff , Narcisa Ana Maria Vasile , Dmitry Malloy , Pallavi Kadam , Bruce Richardson Subject: [RFC] Telemetry enhancements and Windows support Message-ID: <20220402015901.72798593@sovereign> X-Mailer: Claws Mail 3.18.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Vipin from AMD expressed demand for telemetry support on Windows in order to collect port statistics. Implementing a PoC, he stumbled several issues. Together we have designed a solution that eliminates these issues. It affects telemetry operation for all platforms, at least code-wise, We would like community input if this is acceptable or if there are some better suggestions. Telemetry today (talking only about v2): * Telemetry library starts a thread to listen for connections. It is affinitized to the main lcore. A thread is spawned to serve each client connection. * A listening AF_UNIX socket is created in DPDK runtime directory. This guarantees there will be no clash between sockets created by different DPDK processes. It has a fixed name within this directory, so it's trivial to discover. * The socket is SOCK_SEQPACKET. This allows the code to simply use write() and let the socket preserve message bounds and order. dpdk-telemetry.py relies on this, and probably external clients too. #define MAX_OUTPUT_LEN (1024 * 16). * The protocol is JSON objects, one per packet. * Telemetry can be enabled (default) or disabled with an EAL option. Windows issues: * Threading API is implemented in EAL. Currently this is a pthread.h shim, there are plans to replace it with rte_thread API [1]. Hence, there's a circular dependency: EAL -> telemetry -> threading API in EAL. There's a similar issue logging called by this shim. Bruce pointed out that a similar issue is with EAL logs and maybe logging should be moved its own library, on which EAL would depend. * There is no AF_UNIX and no simple way to select a unique free endpoint. * There is no SOCK_SEQPACKET and SOCK_DGRAM size limitation is too small. The only viable option is SOCK_STREAM, but it does not preserve message boundaries. Proposal: * Move threading outside of telemetry, let EAL manage the threads. eal_telemetry_client_thread(conn) { rte_telemetry_serve(conn); } eal_telemetry_server_thread(tm) { pthread_set_affinity_np(...); while (rte_telemetry_listen(tm)) { conn = rte_telemetry_accept(tm); pthread_create(eal_telemetry_client_thread, conn); } } rte_eal_init() { tm_logtype = rte_log_type_register(); tm = rte_telemetry_init(internal_conf.tm_kvargs, tm_logtype); pthread_create(eal_telementry_server_thread, tm); } Among Vipin, Micorosft engineers, and me there is a consensus that libraries should be passive and not create threads by themselves. * The logging issue can be solved differently: a) by factoring logging into a library, either as Bruce suggested; b) by replacing logging with simple print in Windows shim; c) by integrating and using the new threading API [1, 2]. * Allow to select listening endpoint and protocol. Different options may be supported on each system and be the default. rte_kvargs syntax and data structure can be used. --telemetry Default settings. Also kept for compatibility. --telemetry transport=seqpacket,protocol=message AF_UNIX+SOCK_SEQPACKET, no delimiters between objects (same as default for Unices). --telemetry transport=tcp,protocol=line,endpoint=:64000 Line-oriented JSON over a TCP socket at local port 64000 (may be same as --telemetry on Windows). --no-telemetry Disable telemetry. Default on Windows. Parameter names and set are subject to discussion. An last example if a possible future extension as one of the reasons not to fix combinations of "transport=" and "protocol=" (similar to socket API): --telemetry transport=pipe,protocol=line,endpoint=some-name Stream of objects over a named pipe (fifo), for example. Initial support should cover: 1) "transport=seqpacket,protocol=message" on Unices; 2) "transport=tcp,protocol=line,endpoint=" at least on Windows, but this variant is really applicable everywhere. * dpdk-telemetry.py must support all combinations DPDK supports. Defaults don't change, so backward compatibility is preserved. * Because Windows cannot automatically select and endpoint that does not conflict with anything, the default would be to disable telemetry. We also propose to always log telemetry socket path. * It is technically possible to accept remote connections. However, it's a new attack surface and a security risk. OTOH, there's no need to allow remote connections: anyone who needs this can setup a tunnel or something. Local socket must be used by default. [1]: http://patchwork.dpdk.org/project/dpdk/list/?series=22319 [2]: http://patchwork.dpdk.org/project/dpdk/list/?series=20472