DPDK patches and discussions
 help / color / mirror / Atom feed
From: Bruce Richardson <bruce.richardson@intel.com>
To: Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>
Cc: dev@dpdk.org, Vipin Varghese <vipin.varghese@amd.com>,
	Ciara Power <ciara.power@intel.com>,
	Sivaprasad Tummala <Sivaprasad.Tummala@amd.com>,
	Tyler Retzlaff <roretzla@linux.microsoft.com>,
	Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>,
	Dmitry Malloy <dmitrym@microsoft.com>,
	Pallavi Kadam <pallavi.kadam@intel.com>
Subject: Re: [RFC] Telemetry enhancements and Windows support
Date: Mon, 4 Apr 2022 11:03:05 +0100	[thread overview]
Message-ID: <YkrCWUS6p4VHQfH8@bricha3-MOBL.ger.corp.intel.com> (raw)
In-Reply-To: <20220402015901.72798593@sovereign>

On Sat, Apr 02, 2022 at 01:59:01AM +0300, Dmitry Kozlyuk wrote:
> Vipin from AMD expressed demand for telemetry support on Windows
> in order to collect port statistics.
> Implementing a PoC, he stumbled several issues.
> Together we have designed a solution that eliminates these issues.
> It affects telemetry operation for all platforms, at least code-wise,
> We would like community input if this is acceptable
> or if there are some better suggestions.
> 
> Telemetry today (talking only about v2):
> 
> * Telemetry library starts a thread to listen for connections.
>   It is affinitized to the main lcore.
>   A thread is spawned to serve each client connection.
> 
> * A listening AF_UNIX socket is created in DPDK runtime directory.
>   This guarantees there will be no clash between sockets
>   created by different DPDK processes.
>   It has a fixed name within this directory, so it's trivial to discover.
> 
> * The socket is SOCK_SEQPACKET.
>   This allows the code to simply use write()
>   and let the socket preserve message bounds and order.
>   dpdk-telemetry.py relies on this, and probably external clients too.
>   #define MAX_OUTPUT_LEN (1024 * 16).
> 
> * The protocol is JSON objects, one per packet.
> 
> * Telemetry can be enabled (default) or disabled with an EAL option.
> 
> Windows issues:
> 
> * Threading API is implemented in EAL.
>   Currently this is a pthread.h shim,
>   there are plans to replace it with rte_thread API [1].
>   Hence, there's a circular dependency:
>   EAL -> telemetry -> threading API in EAL.
>   There's a similar issue logging called by this shim.
> 
>   Bruce pointed out that a similar issue is with EAL logs
>   and maybe logging should be moved its own library,
>   on which EAL would depend.
> 
> * There is no AF_UNIX and no simple way to select a unique free endpoint.
> 
> * There is no SOCK_SEQPACKET and SOCK_DGRAM size limitation is too small.
>   The only viable option is SOCK_STREAM,
>   but it does not preserve message boundaries.
> 
> Proposal:
> 
> * Move threading outside of telemetry, let EAL manage the threads.
> 
> 	eal_telemetry_client_thread(conn) {
> 		rte_telemetry_serve(conn);	
> 	}
> 
> 	eal_telemetry_server_thread(tm) {
> 		pthread_set_affinity_np(...);
> 		while (rte_telemetry_listen(tm)) {
> 			conn = rte_telemetry_accept(tm);
> 			pthread_create(eal_telemetry_client_thread,
> 			               conn);
> 		}
> 	}
> 
> 	rte_eal_init() {
> 		tm_logtype = rte_log_type_register();
> 		tm = rte_telemetry_init(internal_conf.tm_kvargs,
> 		                        tm_logtype);
> 		pthread_create(eal_telementry_server_thread, tm);
> 	}
> 
>   Among Vipin, Micorosft engineers, and me there is a consensus
>   that libraries should be passive and not create threads by themselves.
> 

Having EAL be the only one to create threads seems reasonable. However, I'm
a little uncertain about the scope of change and tying telemetry and EAL
together a lot more. Another alternative might be to change telemetry to
use only one thread for all connections and using "select" or "poll" to
wait on input from all sockets simultaneously. Would that work on Windows?

> * The logging issue can be solved differently:
> 
>   a) by factoring logging into a library, either as Bruce suggested;
>   b) by replacing logging with simple print in Windows shim;
>   c) by integrating and using the new threading API [1, 2].
> 

What is the issue with the current logging in telemetry on windows? The
current implementation has EAL pass in the logging function to telemetry on
init, allowing use of EAL logging without requiring a link-time dependency.
Theoretically, the same solution could also be used for the threading,
though I'm wary about doing so as using the same solution multiple times
could lead to very awkward APIs.

Overall, I tend to think the proper solution for all this is to split EAL -
something that was discussed previously but never done because it's
difficult and the result may be uncertain. Theoretically, most low-level
services in EAL, such as logging, should be in one library, while the EAL
init code remains in a later, higher-level one.

> * Allow to select listening endpoint and protocol.
>   Different options may be supported on each system and be the default.
>   rte_kvargs syntax and data structure can be used.
> 
> 	--telemetry
> 		Default settings. Also kept for compatibility.
> 	--telemetry transport=seqpacket,protocol=message
> 		AF_UNIX+SOCK_SEQPACKET, no delimiters between objects
> 		(same as default for Unices).
> 	--telemetry transport=tcp,protocol=line,endpoint=:64000
> 		Line-oriented JSON over a TCP socket at local port 64000
> 		(may be same as --telemetry <no argument> on Windows).
> 	--no-telemetry
> 		Disable telemetry. Default on Windows.
> 
>   Parameter names and set are subject to discussion.
> 
>   An last example if a possible future extension
>   as one of the reasons not to fix combinations of "transport="
>   and "protocol=" (similar to socket API):
> 
> 	--telemetry transport=pipe,protocol=line,endpoint=some-name
> 		Stream of objects over a named pipe (fifo), for example.
> 
>   Initial support should cover:
> 
>   1) "transport=seqpacket,protocol=message" on Unices;
>   2) "transport=tcp,protocol=line,endpoint=" at least on Windows,
>      but this variant is really applicable everywhere.
> 
> * dpdk-telemetry.py must support all combinations DPDK supports.
>   Defaults don't change, so backward compatibility is preserved.
> 
> * Because Windows cannot automatically select and endpoint that does not
>   conflict with anything, the default would be to disable telemetry.
>   We also propose to always log telemetry socket path.
> 
> * It is technically possible to accept remote connections.
>   However, it's a new attack surface and a security risk.
>   OTOH, there's no need to allow remote connections:
>   anyone who needs this can setup a tunnel or something.
>   Local socket must be used by default.
> 

Completely agree on not accepting non-local connections. However, I'd also
point out that another reason why a unix socket was chosen over other
TCP/UDP/etc socket options was that it also gave local-user protection.
Right now, if a DPDK process is run as root, no non-root user can connect
to the telemetry socket, unless root explicitly gives permissions. For TCP
or similar sockets, that would not be the case. Not sure how big a deal
that would be.

/Bruce

  parent reply	other threads:[~2022-04-04 10:03 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-01 22:59 Dmitry Kozlyuk
2022-04-02  0:55 ` Dmitry Kozlyuk
2022-04-04 10:03 ` Bruce Richardson [this message]
2022-04-06 21:50   ` Dmitry Kozlyuk
2022-04-06 21:57     ` Stephen Hemminger
2022-04-06 22:03       ` Dmitry Kozlyuk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YkrCWUS6p4VHQfH8@bricha3-MOBL.ger.corp.intel.com \
    --to=bruce.richardson@intel.com \
    --cc=Sivaprasad.Tummala@amd.com \
    --cc=ciara.power@intel.com \
    --cc=dev@dpdk.org \
    --cc=dmitry.kozliuk@gmail.com \
    --cc=dmitrym@microsoft.com \
    --cc=navasile@linux.microsoft.com \
    --cc=pallavi.kadam@intel.com \
    --cc=roretzla@linux.microsoft.com \
    --cc=vipin.varghese@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).