DPDK patches and discussions
 help / color / mirror / Atom feed
* graph: make the in-built nodes better reusable
@ 2024-10-21 12:53 Robin Jarry
  2024-10-22  3:52 ` [EXTERNAL] " Nitin Saxena
  0 siblings, 1 reply; 2+ messages in thread
From: Robin Jarry @ 2024-10-21 12:53 UTC (permalink / raw)
  To: dev, grout; +Cc: Jerin Jacob, Nitin Saxena, David Marchand, Christophe Fontaine

Hi all,

I am starting this discussion to see what can be done in order to make 
the in-built nodes (i.e. https://git.dpdk.org/dpdk/tree/lib/node) easier 
to reuse in external applications.

So far here are the limitations I have found when working on grout. Some 
of these limitations are trivial, some others are more tricky. I hope we 
can get clean solutions.

ethdev_rx and ethdev_tx require cloning
---------------------------------------

These nodes have been written to receive from or transmit to a single 
queue. When changing the number of ports and/or rx/tx queues. The graph 
needs to be recreated.

* Node names must all be unique (hence, node clones need to have 
  different names than their original).

  => There is a routine that automatically adds a "unique" suffix to the 
  cloned ethdev_rx and ethdev_tx names. The "ethdev_rx-<port>-<queue>" 
  and "ethdev_rx-<port>" name patterns are used.

  => it is not possible to prepare the new nodes in advance without 
  destroying the active graph. For example, if one port+queue isn't 
  changed, the "ethdev_rx-<port>-<queue>" name will already exist and be 
  active in the graph. Reusing the same name could lead to data races.

* Node context data cannot be passed during rte_graph_create() or 
  rte_graph_clone().

  Instead, each node init() callback must determine its own context data 
  based on the graph and node pointers it has. In most cases, it is 
  trivial, but not for nodes that have multiple copies per graph.

* Once created/cloned, nodes cannot be destroyed.

  => Removing ports and/or reducing the number queues, results in node 
  clones remaining. It isn't the end of the world but it would be nice 
  to allow destroying them for clarity.

* Graph statistics are per-node.

  => When cloning nodes, we end up with obscure node names such as 
  "ethdev_rx-4-7" in graph statistics. It would be clearer if the clones 
  would be collapsed in the statistics. Having clones is an 
  implementation detail which shouldn't reflect in the results.

  Same with the DOT graph dump, it makes the graph images bloated and 
  also gives a different image per worker. It would be clearer if the 
  original node name was used only.

ip* nodes assume the mbuf data offset is at 0
---------------------------------------------

L3 and L4 nodes assume that the mbuf they process have their data 
offsets pointing to an ethernet header.

This prevents implementing IP tunnels or control to data plane 
communication where the data pointer may need to be at the end of the 
L3 header, for example.

If we change that to adjust the data pointer to the correct OSI layer, 
it would also mandate that each individual node only deals with a single 
OSI layer.

This means that the current ip*_rewrite nodes would need to be split in 
two: ip*_output and eth_output. This may have big implications on the 
optimizations that were made in these nodes.

No explicit API to pass application specific data around
--------------------------------------------------------

This one is more a documentation issue. It would help if there was 
a clear description of how the in-built nodes work together and what 
kind of mbuf private data they require in order to function properly.

Next nodes are hard coded
-------------------------

All next-nodes are set in the in-built nodes. This prevents from reusing 
only a subset of them.

No support for logical interfaces
---------------------------------

All interfaces are supposed to be DPDK ports (e.g. IP next hops contain 
destination Ethernet addresses and DPDK port IDs). This prevents support 
of logical interfaces such as IP tunnels.

No support for multiple VRF
---------------------------

There is a single lpm/lpm6 instance for all ports. This is sort of 
linked to the previous limitation about no having support for logical 
interfaces. Ideally, the lpm/lpm6 instance should be determined from the 
VRF identifier of the input interface (nb: NOT the same thing as 
receiving DPDK port).

Cheers!

-- 
Robin


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-10-22  3:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-21 12:53 graph: make the in-built nodes better reusable Robin Jarry
2024-10-22  3:52 ` [EXTERNAL] " Nitin Saxena

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).