DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC] Telemetry enhancements and Windows support
@ 2022-04-01 22:59 Dmitry Kozlyuk
  2022-04-02  0:55 ` Dmitry Kozlyuk
  2022-04-04 10:03 ` Bruce Richardson
  0 siblings, 2 replies; 6+ messages in thread
From: Dmitry Kozlyuk @ 2022-04-01 22:59 UTC (permalink / raw)
  To: dev, Vipin Varghese, Ciara Power
  Cc: Sivaprasad Tummala, Tyler Retzlaff, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam, Bruce Richardson

Vipin from AMD expressed demand for telemetry support on Windows
in order to collect port statistics.
Implementing a PoC, he stumbled several issues.
Together we have designed a solution that eliminates these issues.
It affects telemetry operation for all platforms, at least code-wise,
We would like community input if this is acceptable
or if there are some better suggestions.

Telemetry today (talking only about v2):

* Telemetry library starts a thread to listen for connections.
  It is affinitized to the main lcore.
  A thread is spawned to serve each client connection.

* A listening AF_UNIX socket is created in DPDK runtime directory.
  This guarantees there will be no clash between sockets
  created by different DPDK processes.
  It has a fixed name within this directory, so it's trivial to discover.

* The socket is SOCK_SEQPACKET.
  This allows the code to simply use write()
  and let the socket preserve message bounds and order.
  dpdk-telemetry.py relies on this, and probably external clients too.
  #define MAX_OUTPUT_LEN (1024 * 16).

* The protocol is JSON objects, one per packet.

* Telemetry can be enabled (default) or disabled with an EAL option.

Windows issues:

* Threading API is implemented in EAL.
  Currently this is a pthread.h shim,
  there are plans to replace it with rte_thread API [1].
  Hence, there's a circular dependency:
  EAL -> telemetry -> threading API in EAL.
  There's a similar issue logging called by this shim.

  Bruce pointed out that a similar issue is with EAL logs
  and maybe logging should be moved its own library,
  on which EAL would depend.

* There is no AF_UNIX and no simple way to select a unique free endpoint.

* There is no SOCK_SEQPACKET and SOCK_DGRAM size limitation is too small.
  The only viable option is SOCK_STREAM,
  but it does not preserve message boundaries.

Proposal:

* Move threading outside of telemetry, let EAL manage the threads.

	eal_telemetry_client_thread(conn) {
		rte_telemetry_serve(conn);	
	}

	eal_telemetry_server_thread(tm) {
		pthread_set_affinity_np(...);
		while (rte_telemetry_listen(tm)) {
			conn = rte_telemetry_accept(tm);
			pthread_create(eal_telemetry_client_thread,
			               conn);
		}
	}

	rte_eal_init() {
		tm_logtype = rte_log_type_register();
		tm = rte_telemetry_init(internal_conf.tm_kvargs,
		                        tm_logtype);
		pthread_create(eal_telementry_server_thread, tm);
	}

  Among Vipin, Micorosft engineers, and me there is a consensus
  that libraries should be passive and not create threads by themselves.

* The logging issue can be solved differently:

  a) by factoring logging into a library, either as Bruce suggested;
  b) by replacing logging with simple print in Windows shim;
  c) by integrating and using the new threading API [1, 2].

* Allow to select listening endpoint and protocol.
  Different options may be supported on each system and be the default.
  rte_kvargs syntax and data structure can be used.

	--telemetry
		Default settings. Also kept for compatibility.
	--telemetry transport=seqpacket,protocol=message
		AF_UNIX+SOCK_SEQPACKET, no delimiters between objects
		(same as default for Unices).
	--telemetry transport=tcp,protocol=line,endpoint=:64000
		Line-oriented JSON over a TCP socket at local port 64000
		(may be same as --telemetry <no argument> on Windows).
	--no-telemetry
		Disable telemetry. Default on Windows.

  Parameter names and set are subject to discussion.

  An last example if a possible future extension
  as one of the reasons not to fix combinations of "transport="
  and "protocol=" (similar to socket API):

	--telemetry transport=pipe,protocol=line,endpoint=some-name
		Stream of objects over a named pipe (fifo), for example.

  Initial support should cover:

  1) "transport=seqpacket,protocol=message" on Unices;
  2) "transport=tcp,protocol=line,endpoint=" at least on Windows,
     but this variant is really applicable everywhere.

* dpdk-telemetry.py must support all combinations DPDK supports.
  Defaults don't change, so backward compatibility is preserved.

* Because Windows cannot automatically select and endpoint that does not
  conflict with anything, the default would be to disable telemetry.
  We also propose to always log telemetry socket path.

* It is technically possible to accept remote connections.
  However, it's a new attack surface and a security risk.
  OTOH, there's no need to allow remote connections:
  anyone who needs this can setup a tunnel or something.
  Local socket must be used by default.

[1]: http://patchwork.dpdk.org/project/dpdk/list/?series=22319
[2]: http://patchwork.dpdk.org/project/dpdk/list/?series=20472

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Telemetry enhancements and Windows support
  2022-04-01 22:59 [RFC] Telemetry enhancements and Windows support Dmitry Kozlyuk
@ 2022-04-02  0:55 ` Dmitry Kozlyuk
  2022-04-04 10:03 ` Bruce Richardson
  1 sibling, 0 replies; 6+ messages in thread
From: Dmitry Kozlyuk @ 2022-04-02  0:55 UTC (permalink / raw)
  To: dev, Vipin Varghese, Ciara Power
  Cc: Sivaprasad Tummala, Tyler Retzlaff, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam, Bruce Richardson

2022-04-02 01:59 (UTC+0300), Dmitry Kozlyuk:
> Telemetry today (talking only about v2):
> 
> * Telemetry library starts a thread to listen for connections.
>   It is affinitized to the main lcore.

Should be: "to the control thread CPU set".

>   A thread is spawned to serve each client connection.

Note: the number of active connections is limited to MAX_CONNECTIONS = 10.
No change is proposed here, although the new kvargs syntax
allows to add telemetry tunables easily.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Telemetry enhancements and Windows support
  2022-04-01 22:59 [RFC] Telemetry enhancements and Windows support Dmitry Kozlyuk
  2022-04-02  0:55 ` Dmitry Kozlyuk
@ 2022-04-04 10:03 ` Bruce Richardson
  2022-04-06 21:50   ` Dmitry Kozlyuk
  1 sibling, 1 reply; 6+ messages in thread
From: Bruce Richardson @ 2022-04-04 10:03 UTC (permalink / raw)
  To: Dmitry Kozlyuk
  Cc: dev, Vipin Varghese, Ciara Power, Sivaprasad Tummala,
	Tyler Retzlaff, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam

On Sat, Apr 02, 2022 at 01:59:01AM +0300, Dmitry Kozlyuk wrote:
> Vipin from AMD expressed demand for telemetry support on Windows
> in order to collect port statistics.
> Implementing a PoC, he stumbled several issues.
> Together we have designed a solution that eliminates these issues.
> It affects telemetry operation for all platforms, at least code-wise,
> We would like community input if this is acceptable
> or if there are some better suggestions.
> 
> Telemetry today (talking only about v2):
> 
> * Telemetry library starts a thread to listen for connections.
>   It is affinitized to the main lcore.
>   A thread is spawned to serve each client connection.
> 
> * A listening AF_UNIX socket is created in DPDK runtime directory.
>   This guarantees there will be no clash between sockets
>   created by different DPDK processes.
>   It has a fixed name within this directory, so it's trivial to discover.
> 
> * The socket is SOCK_SEQPACKET.
>   This allows the code to simply use write()
>   and let the socket preserve message bounds and order.
>   dpdk-telemetry.py relies on this, and probably external clients too.
>   #define MAX_OUTPUT_LEN (1024 * 16).
> 
> * The protocol is JSON objects, one per packet.
> 
> * Telemetry can be enabled (default) or disabled with an EAL option.
> 
> Windows issues:
> 
> * Threading API is implemented in EAL.
>   Currently this is a pthread.h shim,
>   there are plans to replace it with rte_thread API [1].
>   Hence, there's a circular dependency:
>   EAL -> telemetry -> threading API in EAL.
>   There's a similar issue logging called by this shim.
> 
>   Bruce pointed out that a similar issue is with EAL logs
>   and maybe logging should be moved its own library,
>   on which EAL would depend.
> 
> * There is no AF_UNIX and no simple way to select a unique free endpoint.
> 
> * There is no SOCK_SEQPACKET and SOCK_DGRAM size limitation is too small.
>   The only viable option is SOCK_STREAM,
>   but it does not preserve message boundaries.
> 
> Proposal:
> 
> * Move threading outside of telemetry, let EAL manage the threads.
> 
> 	eal_telemetry_client_thread(conn) {
> 		rte_telemetry_serve(conn);	
> 	}
> 
> 	eal_telemetry_server_thread(tm) {
> 		pthread_set_affinity_np(...);
> 		while (rte_telemetry_listen(tm)) {
> 			conn = rte_telemetry_accept(tm);
> 			pthread_create(eal_telemetry_client_thread,
> 			               conn);
> 		}
> 	}
> 
> 	rte_eal_init() {
> 		tm_logtype = rte_log_type_register();
> 		tm = rte_telemetry_init(internal_conf.tm_kvargs,
> 		                        tm_logtype);
> 		pthread_create(eal_telementry_server_thread, tm);
> 	}
> 
>   Among Vipin, Micorosft engineers, and me there is a consensus
>   that libraries should be passive and not create threads by themselves.
> 

Having EAL be the only one to create threads seems reasonable. However, I'm
a little uncertain about the scope of change and tying telemetry and EAL
together a lot more. Another alternative might be to change telemetry to
use only one thread for all connections and using "select" or "poll" to
wait on input from all sockets simultaneously. Would that work on Windows?

> * The logging issue can be solved differently:
> 
>   a) by factoring logging into a library, either as Bruce suggested;
>   b) by replacing logging with simple print in Windows shim;
>   c) by integrating and using the new threading API [1, 2].
> 

What is the issue with the current logging in telemetry on windows? The
current implementation has EAL pass in the logging function to telemetry on
init, allowing use of EAL logging without requiring a link-time dependency.
Theoretically, the same solution could also be used for the threading,
though I'm wary about doing so as using the same solution multiple times
could lead to very awkward APIs.

Overall, I tend to think the proper solution for all this is to split EAL -
something that was discussed previously but never done because it's
difficult and the result may be uncertain. Theoretically, most low-level
services in EAL, such as logging, should be in one library, while the EAL
init code remains in a later, higher-level one.

> * Allow to select listening endpoint and protocol.
>   Different options may be supported on each system and be the default.
>   rte_kvargs syntax and data structure can be used.
> 
> 	--telemetry
> 		Default settings. Also kept for compatibility.
> 	--telemetry transport=seqpacket,protocol=message
> 		AF_UNIX+SOCK_SEQPACKET, no delimiters between objects
> 		(same as default for Unices).
> 	--telemetry transport=tcp,protocol=line,endpoint=:64000
> 		Line-oriented JSON over a TCP socket at local port 64000
> 		(may be same as --telemetry <no argument> on Windows).
> 	--no-telemetry
> 		Disable telemetry. Default on Windows.
> 
>   Parameter names and set are subject to discussion.
> 
>   An last example if a possible future extension
>   as one of the reasons not to fix combinations of "transport="
>   and "protocol=" (similar to socket API):
> 
> 	--telemetry transport=pipe,protocol=line,endpoint=some-name
> 		Stream of objects over a named pipe (fifo), for example.
> 
>   Initial support should cover:
> 
>   1) "transport=seqpacket,protocol=message" on Unices;
>   2) "transport=tcp,protocol=line,endpoint=" at least on Windows,
>      but this variant is really applicable everywhere.
> 
> * dpdk-telemetry.py must support all combinations DPDK supports.
>   Defaults don't change, so backward compatibility is preserved.
> 
> * Because Windows cannot automatically select and endpoint that does not
>   conflict with anything, the default would be to disable telemetry.
>   We also propose to always log telemetry socket path.
> 
> * It is technically possible to accept remote connections.
>   However, it's a new attack surface and a security risk.
>   OTOH, there's no need to allow remote connections:
>   anyone who needs this can setup a tunnel or something.
>   Local socket must be used by default.
> 

Completely agree on not accepting non-local connections. However, I'd also
point out that another reason why a unix socket was chosen over other
TCP/UDP/etc socket options was that it also gave local-user protection.
Right now, if a DPDK process is run as root, no non-root user can connect
to the telemetry socket, unless root explicitly gives permissions. For TCP
or similar sockets, that would not be the case. Not sure how big a deal
that would be.

/Bruce

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Telemetry enhancements and Windows support
  2022-04-04 10:03 ` Bruce Richardson
@ 2022-04-06 21:50   ` Dmitry Kozlyuk
  2022-04-06 21:57     ` Stephen Hemminger
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Kozlyuk @ 2022-04-06 21:50 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: dev, Vipin Varghese, Ciara Power, Sivaprasad Tummala,
	Tyler Retzlaff, Narcisa Ana Maria Vasile, Dmitry Malloy,
	Pallavi Kadam

2022-04-04 11:03 (UTC+0100), Bruce Richardson:
[...]
> Having EAL be the only one to create threads seems reasonable. However, I'm
> a little uncertain about the scope of change and tying telemetry and EAL
> together a lot more.

Scope is a strong valid argument.
What do you think if telemetry would create threads using Win32 API
just like on Unices it creates threads using pthread
(external component that is always present)?

> Another alternative might be to change telemetry to
> use only one thread for all connections and using "select" or "poll" to
> wait on input from all sockets simultaneously. Would that work on Windows?

Yes, there are select() and "poll()" with slightly different naming.

> > * The logging issue can be solved differently:
> > 
> >   a) by factoring logging into a library, either as Bruce suggested;
> >   b) by replacing logging with simple print in Windows shim;
> >   c) by integrating and using the new threading API [1, 2].
> >   
> 
> What is the issue with the current logging in telemetry on windows?

pthread.h shim contains calls to rte_log().

> The
> current implementation has EAL pass in the logging function to telemetry on
> init, allowing use of EAL logging without requiring a link-time dependency.
> Theoretically, the same solution could also be used for the threading,
> though I'm wary about doing so as using the same solution multiple times
> could lead to very awkward APIs.

Thinking out of scope of this RFC, telemetry *library* API should be:
- register handler
- JSON helpers
- create session
- process command in a session
- destroy session
All IO should be outside.
Otherwise it's not a library but a framework.
rte_telemetry_init() and all IO it starts is a higher-level helper
for the very common case that doesn't need to be in lib/telemetry itself
and even doesn't need to be a public API (but can be).

> Overall, I tend to think the proper solution for all this is to split EAL -
> something that was discussed previously but never done because it's
> difficult and the result may be uncertain. Theoretically, most low-level
> services in EAL, such as logging, should be in one library, while the EAL
> init code remains in a later, higher-level one.

I'd even say: utilities (log, trace, UUID, ...), environment abstraction,
and runtime (init, tasks, multiprocess).
It would be interesting to have a topic on that.

[...]
> > * It is technically possible to accept remote connections.
> >   However, it's a new attack surface and a security risk.
> >   OTOH, there's no need to allow remote connections:
> >   anyone who needs this can setup a tunnel or something.
> >   Local socket must be used by default.
> >   
> 
> Completely agree on not accepting non-local connections. However, I'd also
> point out that another reason why a unix socket was chosen over other
> TCP/UDP/etc socket options was that it also gave local-user protection.
> Right now, if a DPDK process is run as root, no non-root user can connect
> to the telemetry socket, unless root explicitly gives permissions. For TCP
> or similar sockets, that would not be the case. Not sure how big a deal
> that would be.

Windows also provides named pipes:

1. Pipes are subject to access control.

2. Pipes can work in message mode, like SOCK_SEQPACKET.

3. Pipes are addressed by path of form \\.\pipe\NAME
   that is not visible in the file system, like abstract Unix sockets,
   but this namespace is shared, so collision issue remains.

4. Pipes can be accessed as files from Python.
   Message mode can be activated with a Win32 API call
   that can be done using ctypes and msvcrt bundled modules.

5. Pipes can be configured to reject remote connections
   and to automatically limit the number of clients if needed.

Another advantage is that message framing will not be needed this time,
which limits the scope.

Drawbacks:

1. There will be less shared code between Windows and Unix.
   Unix domain sockets and named pipes provide the same
   "accept client, send message, receive message, disconnect" API,
   so it's easy to share all non-transport logic.
   dpdk-telemetry.py will need a few Windows-specific lines.

2. On Windows, select() and WSAPoll() are only applicable to sockets.
   If we decide to multiplex IO in lib/telemetry and use named pipes,
   multiplexers will be different.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Telemetry enhancements and Windows support
  2022-04-06 21:50   ` Dmitry Kozlyuk
@ 2022-04-06 21:57     ` Stephen Hemminger
  2022-04-06 22:03       ` Dmitry Kozlyuk
  0 siblings, 1 reply; 6+ messages in thread
From: Stephen Hemminger @ 2022-04-06 21:57 UTC (permalink / raw)
  To: Dmitry Kozlyuk
  Cc: Bruce Richardson, dev, Vipin Varghese, Ciara Power,
	Sivaprasad Tummala, Tyler Retzlaff, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

On Thu, 7 Apr 2022 00:50:29 +0300
Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:

> 2022-04-04 11:03 (UTC+0100), Bruce Richardson:
> [...]
> > Having EAL be the only one to create threads seems reasonable. However, I'm
> > a little uncertain about the scope of change and tying telemetry and EAL
> > together a lot more.  
> 
> Scope is a strong valid argument.
> What do you think if telemetry would create threads using Win32 API
> just like on Unices it creates threads using pthread
> (external component that is always present)?

Why isn't telemetry using existing rte_ctrl_thread API?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC] Telemetry enhancements and Windows support
  2022-04-06 21:57     ` Stephen Hemminger
@ 2022-04-06 22:03       ` Dmitry Kozlyuk
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Kozlyuk @ 2022-04-06 22:03 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Bruce Richardson, dev, Vipin Varghese, Ciara Power,
	Sivaprasad Tummala, Tyler Retzlaff, Narcisa Ana Maria Vasile,
	Dmitry Malloy, Pallavi Kadam

2022-04-06 14:57 (UTC-0700), Stephen Hemminger:
> On Thu, 7 Apr 2022 00:50:29 +0300
> Dmitry Kozlyuk <dmitry.kozliuk@gmail.com> wrote:
> 
> > 2022-04-04 11:03 (UTC+0100), Bruce Richardson:
> > [...]  
> > > Having EAL be the only one to create threads seems reasonable. However, I'm
> > > a little uncertain about the scope of change and tying telemetry and EAL
> > > together a lot more.    
> > 
> > Scope is a strong valid argument.
> > What do you think if telemetry would create threads using Win32 API
> > just like on Unices it creates threads using pthread
> > (external component that is always present)?  
> 
> Why isn't telemetry using existing rte_ctrl_thread API?

It would depend on EAL at link time then.
But EAL already depends on telemetry and there would be a loop.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-06 22:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-01 22:59 [RFC] Telemetry enhancements and Windows support Dmitry Kozlyuk
2022-04-02  0:55 ` Dmitry Kozlyuk
2022-04-04 10:03 ` Bruce Richardson
2022-04-06 21:50   ` Dmitry Kozlyuk
2022-04-06 21:57     ` Stephen Hemminger
2022-04-06 22:03       ` Dmitry Kozlyuk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).