DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Morten Brørup" <mb@smartsharesystems.com>
To: "Andrzej Ostruszka" <aostruszka@marvell.com>, <dev@dpdk.org>
Cc: "Jerin Jacob Kollanukkaran" <jerinj@marvell.com>,
	"Nithin Kumar Dabilpuram" <ndabilpuram@marvell.com>,
	"Pavan Nikhilesh Bhagavatula" <pbhagavatula@marvell.com>,
	"Kiran Kumar Kokkilagadda" <kirankumark@marvell.com>,
	"Krzysztof Kanas" <kkanas@marvell.com>
Subject: Re: [dpdk-dev] [RFC PATCH 0/3] introduce IF proxy library
Date: Tue, 14 Jan 2020 16:16:06 +0100	[thread overview]
Message-ID: <98CBD80474FA8B44BF855DF32C47DC35C60CE0@smartserver.smartshare.dk> (raw)
In-Reply-To: <20200114142517.29522-1-aostruszka@marvell.com>

Andrzej,

Basically you are adding a very small subset of the Linux IP stack to interface with DPDK applications via callbacks. The library also seems to support interfacing to the route table, so it is not "interface proxy", but "IP stack proxy".

You already mention ARP table as future work. How about namespaces, ip tables, and other advanced features... I foresee the Devil in the details for any real use case.

Unless the library is an O/S wrapper to make Linux NETLINK-like messages available from other operating systems, I don't really see the value in this library... if it is Linux specific, why not just use NETLINK in the DPDK application's control plane?


Med venlig hilsen / kind regards
- Morten Brørup

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Andrzej Ostruszka
> Sent: Tuesday, January 14, 2020 3:25 PM
> To: dev@dpdk.org
> Cc: Jerin Jacob Kollanukkaran; Nithin Kumar Dabilpuram; Pavan Nikhilesh
> Bhagavatula; Kiran Kumar Kokkilagadda; Krzysztof Kanas
> Subject: [dpdk-dev] [RFC PATCH 0/3] introduce IF proxy library
> 
> What is this useful for
> =======================
> 
> Usually, when an ethernet port is assigned to DPDK it vanishes from the
> system and user looses ability to control it via normal configuration
> utilities (e.g. those from iproute2 package).  Moreover by default DPDK
> application is not aware of the network configuration of the system.
> 
> To address both of these issues application needs to:
> - add some command line interface (or other mechanism) allowing for
>   control of the port and its configuration
> - query the status of network configuration and monitor its changes
> 
> The purpose of this library is to help with both of these tasks (as
> long
> as they remain in domain of configuration available to the system).  In
> other words, if DPDK application has some special needs, that cannot be
> addressed by the normal system configuration utilities, then they need
> to be solved by the application itself.
> 
> The connection between DPDK and system is based on the existence of
> ports that are visible to both DPDK and system (like Tap, KNI and
> possibly some other drivers).  These ports serve as an interface
> proxies.
> 
> Let's visualize the action of the library by the following example:
> 
>               Linux             |            DPDK
> ==============================================================
>                                 |
>                                 |   +-------+       +-------+
>                                 |   | Port1 |       | Port2 |
> "ip link set dev tap1 mtu 1600" |   +-------+       +-------+
>                           |     |       ^               ^
>                           |  +------+   | mtu_change    |
>                           `->| Tap1 |---' callback      |
>                              +------+                   |
> "ip addr add 198.51.100.14 \    |                       |
>                   dev tap2"     |                       |
>                           |  +------+                   |
>                           `->| Tap2 |-------------------'
>                              +------+   addr_add callback
>                                 |
> "ip route add 198.0.2.0/24 \    |
>                   dev eth0"     |
>                           |     |   route_add callback
>                           `------------->
>                                 |
> 
> So we have two ports Port1 and Port2 that are not visible to the
> system.
> We create two proxy interfaces (here based on Tap driver) and bind the
> ports to their proxies.  When user issues a command changing MTU for
> Tap1 interface the library notes this and calls "mtu_change" callback
> for the Port1.  Similarly when user adds an IPv4 address to the Tap2
> interface "addr_add" callback is called for the Port2.  Note also that
> that not only port related callbacks are available - for example you
> can
> also get information about routing table.  See below for a complete
> list
> of available callbacks.
> 
> Please note that nothing has been mentioned about forwarding of the
> packets between system and DPDK.  Since the proxies are normal DPDK
> ports you can receive/send to them via usual RX/TX burst API.  However
> since the library is not aware of the structure of packet processing
> used by the application it cannot automatically forward the packets -
> it
> is responsibility of the application to include proxy ports into its
> packet processing engine.
> 
> As mentioned above the intention of the library is to:
> - provide information about network configuration that would allow
>   application to decide what to do with the packets received on DPDK
>   ports,
> - allow for control of the ports via standard configuration utilities
> 
> Although the library only helps you to identify proxy for given port
> (and vice versa) and calls appropriate callbacks it does open some
> interesting possibilities.  For example you can use the proxy ports to
> forward packets for protocols that you do not wish to handle in DPDK
> application to the system protocol stack and just listen to the
> configuration changes - so that way you can "offload" handling of those
> protocols to the system.
> 
> 
> Why this RFC
> ============
> 
> We would like to solicit some input from the community:
> - regarding usefulness of this library
> - what is missing or what needs to be changed
> - about currently proposed API
> - any other suggestions and/or improvements are also welcome
> 
> 
> How to use it
> =============
> 
> Usage of this library is rather simple.  You have to:
> 1. Create proxy (if you don't have port suitable for being proxy or you
>   have one but do not wish to use it as a proxy).
> 2. Bind port to proxy.
> 3. Register callbacks.
> 4. Start listening to the network configuration.
> 
> The only mandatory requirement for DPDK port to be able to act as
> a proxy is that it is visible in the system - this is checked during
> port to proxy binding by calling rte_eth_dev_info_get() on proxy port
> and inspecting 'if_index' field (it has to be non-zero).
> One can create such port in the application by calling:
> 
>   proxy_id = rte_ifpx_create(RTE_IFPX_DEFAULT);
> 
> Upon success this returns id of DPDK proxy port created
> (RTE_MAX_ETHPORTS on failure).  The argument selects type of proxy port
> to create (currently Tap/KNI only).  This function actually is just
> a wrapper around:
> 
>   uint16_t rte_ifpx_create_by_devarg(const char *devarg);
> 
> creating valid 'devarg' string for the chosen type of proxy.  If you
> have
> other driver capable of acting as a proxy you can call
> rte_ifpx_create_by_devarg() directly passing appropriate argument.
> 
> Once you have id of both port and proxy you can bind the two via:
> 
>   rte_ifpx_port_bind(port_id, proxy_id);
> 
> This creates logical binding - as mentioned above there is no automatic
> packet forwarding.  With this binding whenever user changes the state
> of
> proxy interface in the system (link up/down, change mac/mtu, add/remove
> IPv4/IPv6) you get appropriate callback called for the bound port.
> 
> So far we've mentioned several times that the library calls callbacks.
> They are grouped in 'struct rte_ifpx_callbacks' and user provides them
> to the library via:
> 
>   rte_ifpx_callbacks_register(&cbs);
> 
> It is worth mentioning that the context (lcore/thread) in which these
> callbacks are called is implementation defined.  It might differ
> between
> different platforms, so the application needs to assume that some kind
> of inter lcore/thread synchronization/communication is required.
> 
> Once we have bindings in place and callbacks registered, the only
> essential part that remains is to get the current network configuration
> and start listening to its changes.  This is accomplished via a call
> to:
> 
>   rte_ifpx_listen();
> 
> And basically this is all one needs to understand how to use this
> library.  Other less essential parts include:
> - ability to query what callbacks are available for given platform
> - getting mapping between proxy and port
> - unbinding the ports from proxy
> - destroying proxy port
> - closing the listening service
> - getting basic information about proxy
> 
> 
> Currently available features and implementation
> ===============================================
> 
> The library's API is system independent but it obviously needs some
> system dependent parts.  We provide exemplary Linux implementation
> (based
> on netlink sockets).  Very similar implementation is possible for
> FreeBSD (with the usage of PF_ROUTE sockets).  Windows implementation
> would need to differ much (probably IP Helper library would be of some
> help).
> 
> Here is the list of currently implemented callbacks:
> 
> struct rte_ifpx_callbacks {
>   void  (*mac_change)(uint16_t port_id, const struct rte_ether_addr
> *mac);
>   void  (*mtu_change)(uint16_t port_id, uint16_t mtu);
>   void (*link_change)(uint16_t port_id, int is_up);
>   void    (*addr_add)(uint16_t port_id, uint32_t ip);
>   void    (*addr_del)(uint16_t port_id, uint32_t ip);
>   void   (*addr6_add)(uint16_t port_id, const uint8_t *ip);
>   void   (*addr6_del)(uint16_t port_id, const uint8_t *ip);
>   void   (*route_add)(uint32_t ip, uint8_t depth);
>   void   (*route_del)(uint32_t ip, uint8_t depth);
>   void  (*route6_add)(const uint8_t *ip, uint8_t depth);
>   void  (*route6_del)(const uint8_t *ip, uint8_t depth);
>   void (*cfg_finished)(void);
> };
> 
> They are all rather self-descriptive with the exception of the last
> one.
> When the user calls rte_ifpx_listen() the library first queries the
> system for its current configuration.  That might require several
> request/reply exchanges between DPDK and system and once it is finished
> this callback is called to let application know that all info has been
> gathered.
> 
> BTW at the moment all IPv4 addresses are passed in host order.
> 
> It is worth to mention also that while typical case would be a 1-to-1
> mapping between port and proxy, the 1-to-many mapping is also
> supported.
> In that case port related callbacks will be called for each port bound
> to given proxy interface - in that case it is application
> responsibility
> to define semantic of such mapping (e.g. all changes apply to all
> ports,
> or link changes apply to all but other are accepted in "round robin"
> fashion, or ...).
> 
> As mentioned above Linux implementation is based on netlink socket.
> This socket is registered as file descriptor in EAL interrupts
> (similarly to how EAL alarms are implemented).
> 
> 
> What is inside this RFC
> =======================
> - 1 commit for API
> - 1 commit for implementation - this is just to show PoC, and allow for
>   early playing around with the idea (e.g. run the test/example from
> the
>   next commit)
> - 1 commit for test/example - just to show how this can be used
> 
> 
> Next steps
> ==========
> 
> - gather community feedback
> - polish the implementation:
>   * call the notification callbacks without lock held (at the moment
>     attempts to modify callbacks from within the callback would
> deadlock)
>   * separate the system dependent parts from the rest so that it is
> easy
>     to figure out what needs to be reimplemented on different platforms
>   * apply community suggestions - if any
> - add neighbour callbacks (ARP table)
> 
> Best regards
> Andrzej Ostruszka
> 
> Andrzej Ostruszka (3):
>   lib: introduce IF proxy library (API)
>   if_proxy: add preliminary Linux implementation
>   if_proxy: add example, test and documentation
> 
>  app/test/Makefile                             |   5 +
>  app/test/meson.build                          |   1 +
>  app/test/test_if_proxy.c                      | 431 ++++++++++
>  config/common_base                            |   5 +
>  doc/guides/prog_guide/if_proxy_lib.rst        | 103 +++
>  doc/guides/prog_guide/index.rst               |   1 +
>  examples/Makefile                             |   1 +
>  examples/if_proxy/Makefile                    |  58 ++
>  examples/if_proxy/main.c                      | 203 +++++
>  examples/if_proxy/meson.build                 |  12 +
>  examples/meson.build                          |   2 +-
>  lib/Makefile                                  |   2 +
>  .../common/include/rte_eal_interrupts.h       |   2 +
>  lib/librte_eal/linux/eal/eal_interrupts.c     |  14 +-
>  lib/librte_if_proxy/Makefile                  |  25 +
>  lib/librte_if_proxy/meson.build               |   7 +
>  lib/librte_if_proxy/rte_if_proxy.c            | 803 ++++++++++++++++++
>  lib/librte_if_proxy/rte_if_proxy.h            | 364 ++++++++
>  lib/meson.build                               |   2 +-
>  19 files changed, 2035 insertions(+), 6 deletions(-)
>  create mode 100644 app/test/test_if_proxy.c
>  create mode 100644 doc/guides/prog_guide/if_proxy_lib.rst
>  create mode 100644 examples/if_proxy/Makefile
>  create mode 100644 examples/if_proxy/main.c
>  create mode 100644 examples/if_proxy/meson.build
>  create mode 100644 lib/librte_if_proxy/Makefile
>  create mode 100644 lib/librte_if_proxy/meson.build
>  create mode 100644 lib/librte_if_proxy/rte_if_proxy.c
>  create mode 100644 lib/librte_if_proxy/rte_if_proxy.h
> 
> --
> 2.17.1
> 


  parent reply	other threads:[~2020-01-14 15:16 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-14 14:25 Andrzej Ostruszka
2020-01-14 14:25 ` [dpdk-dev] [RFC PATCH 1/3] lib: introduce IF proxy library (API) Andrzej Ostruszka
2020-01-14 14:25 ` [dpdk-dev] [RFC PATCH 2/3] if_proxy: add preliminary Linux implementation Andrzej Ostruszka
2020-01-14 14:25 ` [dpdk-dev] [RFC PATCH 3/3] if_proxy: add example, test and documentation Andrzej Ostruszka
2020-01-14 15:16 ` Morten Brørup [this message]
2020-01-14 17:38   ` [dpdk-dev] [RFC PATCH 0/3] introduce IF proxy library Andrzej Ostruszka
2020-01-15 10:15     ` Bruce Richardson
2020-01-15 11:27       ` Jerin Jacob
2020-01-15 12:28       ` Morten Brørup
2020-01-15 12:57         ` Jerin Jacob
2020-01-15 15:30           ` Morten Brørup
2020-01-15 16:04             ` Jerin Jacob
2020-01-15 18:15               ` Morten Brørup
2020-01-16  7:15                 ` Jerin Jacob
2020-01-16  9:11                   ` Morten Brørup
2020-01-16  9:09                 ` Andrzej Ostruszka
2020-01-16  9:30                   ` Morten Brørup
2020-01-16 10:42                     ` Andrzej Ostruszka
2020-01-16 10:58                       ` Morten Brørup
2020-01-16 12:06                         ` Andrzej Ostruszka
2020-01-15 14:09         ` Bruce Richardson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=98CBD80474FA8B44BF855DF32C47DC35C60CE0@smartserver.smartshare.dk \
    --to=mb@smartsharesystems.com \
    --cc=aostruszka@marvell.com \
    --cc=dev@dpdk.org \
    --cc=jerinj@marvell.com \
    --cc=kirankumark@marvell.com \
    --cc=kkanas@marvell.com \
    --cc=ndabilpuram@marvell.com \
    --cc=pbhagavatula@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).